I have a site I'm trying to scrape with Jsoup that has monthly and yearly selection boxes where the data changes when a different month or year is selected. Edit* The data changes as soon as the month or year is changed and there is no submit button.
Default is always this month and year which I can scrape but I'm hoping to scrape historic data for previous months and years by selecting different values in the drop-downs.
I can find the Element where the drop-down is placed but no matter what I've tried, I can't make a selection and change the month via Jsoup. Is it possible to change the "option selected value" and submit it via a FormElement?
I've included the code and a System.out.println() where the code is based.
Connection.Response resp = Jsoup.connect(url) //
.timeout(30000) //
.method(Connection.Method.GET) //
.execute();
Document doc = resp.parse();
Element pForm = doc.selectFirst("select:nth-of-type(2)");
System.out.println(pForm.toString());
// Result:
//<select name="month" size="1" style="background-color:#FFFFFF;text-align: left; font-family: Arial Narrow; font-size: 10" onchange="this.form.submit();"> <option value="June">June</option> <option value="July">July</option> <option value="August">August</option> <option value="September">September</option> <option value="October">October</option> <option value="November">November</option> <option value="December">December</option> <option value="January">January</option> <option value="February">February</option> <option value="March">March</option> <option value="April">April</option> <option selected value="May">May</option> </select>
CodePudding user response:
You have to send a POST request (instead of a GET) with the <form>
you described in the comments. The onchange-event just does submit the form as soon as you change the value of the dropdown.
According to the docs on https://jsoup.org/cookbook/input/load-document-from-url you would add the form data and send the request like this:
Document doc = Jsoup.connect(url)
.timeout(30000)
.data("fixtures", "June")
.post();
This will set the "fixtures" select to the value "June". Take care that your url
points to the form action ("fixturesall.php"), as the HTML does not contain any path this is relative to the path you requested the initial page from.
As the screenshot showed multiple selects, it might be necessary to add further parameters to your form request. I'm not sure (didn't test it) if JSoup allows adding up multiple calls to .data()
or if one data call replaces the previous. According to the JavaDoc I would think that data adds up.
To be sure, you can also provide a Map
to data()
that contains multiple key-value pairs.