Hi guys I'm using jsoup in a java webapplication on IntelliJ. I'm trying to scrape data of port call events from a shiptracking website and store the data in a mySQL database.
The data for the events is organised in divs with the class name table-group and the values are in another div with the class name table-row.
My problem is the divs rows for all the vessel are all the same class name and im trying to loop through each row and push the data to a database. So far i have managed to create a java class to scrape the first row.
How can i loop through each row and store those values to my database. Should i create an array list to store the values?
this is my scraper class
public class Scarper {
private static Document doc;
public static void main(String[] args) {
final String url =
"https://www.myshiptracking.com/ports-arrivals-departures/?mmsi=&pid=277&type=0&time=&pp=20";
try {
doc = Jsoup.connect(url).get();
} catch (IOException e) {
e.printStackTrace();
}
Events();
}
public static void Events() {
Elements elm = doc.select("div.table-group:nth-of-type(2) > .table-row");
List<String> arrayList = new ArrayList();
for (Element ele : elm) {
String event = ele.select("div.col:nth-of-type(2)").text();
String time = ele.select("div.col:nth-of-type(3)").text();
String port = ele.select("div.col:nth-of-type(4)").text();
String vessel = ele.select(".td_vesseltype.col").text();
Event ev = new Event();
System.out.println(event);
System.out.println(time);
System.out.println(port);
System.out.println(vessel);
}
}
}
sample of the div classes i want to scrape
<div style="box-sizing: border-box;padding: 0px 10px 10px 10px;">
<div >
<div >
<div style="width: 10px"></div>
<div style="width: 110px">Event</div>
<div style="width: 120px">Time (<span title="My Time: In your current TimeZone">MT</span>)</div>
<div style="width: 150px">Port</div>
<div >Vessel</div>
</div>
<div >
<div >
<div ><i ></i></div>
<div >Departure</div>
<div style="text-align: center;">2022-02-14 <b>16:51</b></div>
<div ><img src="/icons/flags2/16/GB.png" title=" United Kingdom"/><a href="/ports/port-of-belfast-in-gb-united-kingdom-id-101">BELFAST</a></div>
<div ><img src="/icons/icon7_511.png"><span ><a href="/vessels/wilson-blyth-mmsi-314544000-imo-9124419">WILSON BLYTH</a> [GB]</span></div>
</div>
</div>
<div >
<div >
<div ><i ></i></div>
<div >Arrival</div>
<div style="text-align: center;">2022-02-14 <b>16:51</b></div>
<div ><img src="/icons/flags2/16/GB.png" title=" United Kingdom"/><a href="/ports/port-of-hunters-quay-in-gb-united-kingdom-id-218">HUNTERS QUAY</a></div>
<div ><img src="/icons/icon6_511.png"><span ><a href="/vessels/sound-of-soay-mmsi-235101063-imo-9665229">SOUND OF SOAY</a> [GB]</span></div>
</div>
</div>
<div >
<div >
<div ><i ></i></div>
<div >Departure</div>
<div style="text-align: center;">2022-02-14 <b>16:51</b></div>
<div ><img src="/icons/flags2/16/GB.png" title=" United Kingdom"/><a href="/ports/port-of-largs-in-gb-united-kingdom-id-1602">LARGS</a></div>
<div ><img src="/icons/icon6_511.png"><span ><a href="/vessels/loch-shira-mmsi-235053239-imo-9376919">LOCH SHIRA</a> [GB]</span></div>
</div>
</div>
<div >
<div >
<div ><i ></i></div>
<div >Departure</div>
<div style="text-align: center;">2022-02-14 <b>16:51</b></div>
<div ><img src="/icons/flags2/16/GB.png" title=" United Kingdom"/><a href="/ports/port-of-ryde-in-gb-united-kingdom-id-1629">RYDE</a></div>
<div ><img src="/icons/icon4_511.png"><span ><a href="/vessels/island-flyer-mmsi-235117772-imo-9737797">ISLAND FLYER</a> [GB]</span></div>
</div>
</div>
CodePudding user response:
You can start with looping over the table's rows: the selector for the table is .cs-table
so you can get the table with Element table = doc.select(".cs-table").first();
. Next you can get the table's rows with the selector div.table-row
- Elements rows = doc.select("div.table-row");
now you can loop over all the rows and extract the data from each row. The code should look like:
Element table = doc.select(".cs-table").first();
Elements rows = doc.select("div.table-row");
for (Element row : rows) {
String event = row.select("div.col:nth-of-type(2)").text();
String time = row.select("div.col:nth-of-type(3)").text();
String port = row.select("div.col:nth-of-type(4)").text();
String vessel = row.select(".td_vesseltype.col").text();
System.out.println(event "-" time " " port " " vessel);
System.out.println("---------------------------");
// Do stuff with data here
}
Now it's up to you to decide if you want to keep the data in some array/list inside the loop and use it later, or to insert it directly to your database.