Skip to content Skip to sidebar Skip to footer

How To Fix: HtmlUnit GetElementById Returns Null

I am writing a web scraper and am trying to type in a search word into a search box. However, it looks like I am getting null when I try to access the search box by ID. I am just l

Solution 1:

Even if the page looks simple, this page is (like many shopping portals) really complicated and based on tons of javascript (not only for the page itself, but also for all this nasty trackers to observe the users). If you like to learn more about this page i suggest to use a web proxy like Charles to capture the whole traffic.

Now back to your problem... Because HtmlUnit javascript support (based on Rhino) is not perfect, you face some javascript errors. To not stop at js errors, you have to configure the client

webClient.getOptions().setThrowExceptionOnScriptError(false);

The next step is to get the page. This is also not that simple because of all the js stuff. It looks like the js stuff also replaces the page initially returned by getting the url. Because of this you have to do three steps

  • get the page
  • wait some time to let the js do some work
  • get the current page from the current window

Now you are able to find the search field; type some search into it and finally press the search button. Then you have to do again the three steps to get the current content.

Hope that helps....

public static void main(String[] args) throws IOException {
    String url = "https://www.garageclothing.com/ca";

    try (final WebClient webClient = new WebClient()) {
        // do not stop at js errors
        webClient.getOptions().setThrowExceptionOnScriptError(false);

        webClient.getPage(url);
        webClient.waitForBackgroundJavaScript(10000);

        HtmlPage page = (HtmlPage) webClient.getCurrentWindow().getEnclosedPage();
        HtmlInput searchInput = (HtmlInput) page.getElementById("searchText");
        searchInput.type("red scarf");

        HtmlElement submitBtn = (HtmlElement) page.getElementByName("search");
        submitBtn.click();
        webClient.waitForBackgroundJavaScript(10000);

        page = (HtmlPage) webClient.getCurrentWindow().getEnclosedPage();
        // System.out.println("------------------------------------------------");
        // System.out.println(page.asXml());

        System.out.println("------------------------------------------------");
        final DomNodeList<DomNode> divs = page.querySelectorAll(".divProdPriceSale");
        for (DomNode div : divs) {
            System.out.println(div.asText());
        }
    }
}

Solution 2:

You should check the URL you are passing to the WebClient is the one you are viewing in the web browser you are using.

I went to the link you use in your code (https://www.garageclothing.com) and the page I got is not the one you are expecting. It asked me to pick a country (USA or Canada) and after I clicked in any of the options, it then took me to the page you are expecting.

Try changing the URL to "https://www.garageclothing.com/us/" or "https://www.garageclothing.com/ca/"


Post a Comment for "How To Fix: HtmlUnit GetElementById Returns Null"