Home > other >  How do i loop through divs using jsoup
How do i loop through divs using jsoup

Time:02-17

Hi guys I'm using jsoup in a java webapplication on IntelliJ. I'm trying to scrape data of port call events from a shiptracking website and store the data in a mySQL database.

The data for the events is organised in divs with the class name table-group and the values are in another div with the class name table-row.
My problem is the divs rows for all the vessel are all the same class name and im trying to loop through each row and push the data to a database. So far i have managed to create a java class to scrape the first row.
How can i loop through each row and store those values to my database. Should i create an array list to store the values?



this is my scraper class

public class Scarper {

    private static Document doc;

    public static void main(String[] args) {

        final String url =
                "https://www.myshiptracking.com/ports-arrivals-departures/?mmsi=&pid=277&type=0&time=&pp=20";

        try {

            doc = Jsoup.connect(url).get();
        } catch (IOException e) {
            e.printStackTrace();
        }
        Events();
    }

    public static void Events() {
        Elements elm = doc.select("div.table-group:nth-of-type(2) > .table-row");

        List<String> arrayList = new ArrayList();

        for (Element ele : elm) {

            String event = ele.select("div.col:nth-of-type(2)").text();
            String time = ele.select("div.col:nth-of-type(3)").text();
            String port = ele.select("div.col:nth-of-type(4)").text();
            String vessel = ele.select(".td_vesseltype.col").text();
            Event ev = new Event();
            System.out.println(event);
            System.out.println(time);
            System.out.println(port);
            System.out.println(vessel);
        }
    }
}

sample of the div classes i want to scrape

<div style="box-sizing: border-box;padding: 0px 10px 10px 10px;">
            <div >
                <div >
                    <div  style="width: 10px"></div>
                    <div  style="width: 110px">Event</div>
                    <div  style="width: 120px">Time (<span  title="My Time: In your current TimeZone">MT</span>)</div>
                    <div  style="width: 150px">Port</div>
                    <div >Vessel</div>
                </div>
                                    <div >
                    <div >
                        <div ><i ></i></div>
                        <div >Departure</div>
                        <div  style="text-align: center;">2022-02-14 <b>16:51</b></div>
                        <div ><img  src="/icons/flags2/16/GB.png" title=" United Kingdom"/><a href="/ports/port-of-belfast-in-gb-united-kingdom-id-101">BELFAST</a></div>
                        <div ><img src="/icons/icon7_511.png"><span ><a href="/vessels/wilson-blyth-mmsi-314544000-imo-9124419">WILSON BLYTH</a> [GB]</span></div>
                    </div>
                </div>
                                    <div >
                    <div >
                        <div ><i ></i></div>
                        <div >Arrival</div>
                        <div  style="text-align: center;">2022-02-14 <b>16:51</b></div>
                        <div ><img  src="/icons/flags2/16/GB.png" title=" United Kingdom"/><a href="/ports/port-of-hunters-quay-in-gb-united-kingdom-id-218">HUNTERS QUAY</a></div>
                        <div ><img src="/icons/icon6_511.png"><span ><a href="/vessels/sound-of-soay-mmsi-235101063-imo-9665229">SOUND OF SOAY</a> [GB]</span></div>
                    </div>
                </div>
                                    <div >
                    <div >
                        <div ><i ></i></div>
                        <div >Departure</div>
                        <div  style="text-align: center;">2022-02-14 <b>16:51</b></div>
                        <div ><img  src="/icons/flags2/16/GB.png" title=" United Kingdom"/><a href="/ports/port-of-largs-in-gb-united-kingdom-id-1602">LARGS</a></div>
                        <div ><img src="/icons/icon6_511.png"><span ><a href="/vessels/loch-shira-mmsi-235053239-imo-9376919">LOCH SHIRA</a> [GB]</span></div>
                    </div>
                </div>
                                    <div >
                    <div >
                        <div ><i ></i></div>
                        <div >Departure</div>
                        <div  style="text-align: center;">2022-02-14 <b>16:51</b></div>
                        <div ><img  src="/icons/flags2/16/GB.png" title=" United Kingdom"/><a href="/ports/port-of-ryde-in-gb-united-kingdom-id-1629">RYDE</a></div>
                        <div ><img src="/icons/icon4_511.png"><span ><a href="/vessels/island-flyer-mmsi-235117772-imo-9737797">ISLAND FLYER</a> [GB]</span></div>
                    </div>
                </div>

CodePudding user response:

You can start with looping over the table's rows: the selector for the table is .cs-table so you can get the table with Element table = doc.select(".cs-table").first();. Next you can get the table's rows with the selector div.table-row - Elements rows = doc.select("div.table-row"); now you can loop over all the rows and extract the data from each row. The code should look like:

Element table = doc.select(".cs-table").first();
Elements rows = doc.select("div.table-row");
for (Element row : rows) {
        String event = row.select("div.col:nth-of-type(2)").text();
        String time = row.select("div.col:nth-of-type(3)").text();
        String port = row.select("div.col:nth-of-type(4)").text();
        String vessel = row.select(".td_vesseltype.col").text();
        System.out.println(event   "-"   time   " "   port   " "   vessel);
        System.out.println("---------------------------");
        // Do stuff with data here
    }

Now it's up to you to decide if you want to keep the data in some array/list inside the loop and use it later, or to insert it directly to your database.

  • Related