Home > OS >  Web scraping a webpage with Java using HtmlUnit
Web scraping a webpage with Java using HtmlUnit

Time:10-26

I'm just starting to get into scraping and stuff like that and wrote a simple code. I'm trying to access this websie https://parimatch.com (this is a betting website) and i just want to get information from it in a string form thats it. But i do not get anything from it. Here is my code:

public static void main(String[] args) throws IOException {
    String url = "https://parimatch.com";
    WebClient webclient = new WebClient();
    webclient.getOptions().setCssEnabled(false);
    webclient.getOptions().setJavaScriptEnabled(false);

    HtmlPage page = webclient.getPage(url);
    System.out.println(page.asText());
}

And as an output i'm getting only this:

Parimatch ... ... ... ... ... ... AccessDeniedAccess DeniedF9M61D7DJ91H4VV9/ZwxOdmTFgSBUqONvXN4N NV5xPMsaZOgXXfD7P1bC/eLXBJRZ4bjiQZ33gXQUwFnjxcCr/1tw4= ... ... ... ...

Please can someone tell me why am i getting only this or what is a reason for that? And what do i do in this case?

CodePudding user response:

The page you are scraping does not have much of a static "HTML Page." It is strictly loaded by Java-Script. The Java-Script on this Russian Gambling web-site has a security box that asks you to "Click Images" to prevent Web-Scraping.... They know about this stuff! :)

Go to your web-site in Google Chrome, then right-click, and from the Menu you see choose "View Source." You will see your Access Denied Message!

Contents produced by "View Source" Button

  • Related