Home > Back-end >  How to remove unwanted rows result in Java jsoup html parsing?
How to remove unwanted rows result in Java jsoup html parsing?

Time:03-07

I am trying to getting some text from below java code with using jsoup. When I run this code console shows fully as below, I can't take any value which I needed. Web site only has one class named "odd" and I cannot choose any others. How can I get below results separately and assign to string?

I need this output only:

Date Home Score Away Half Time
Fri 4 Feb Kayserispor 4 - 3 Hatayspor (1-0)

Console Results:

<tr  height="28">
 <td align="right" style="padding-right:5px;"><font size="1" color="green">Fri 4 Feb</font></td>
 <td align="right">Kayserispor&nbsp;</td>
 <td align="center"><font color="blue"><b>4 - 3</b></font></td>
 <td align="left">&nbsp;Hatayspor</td>
 <td align="center" valign="middle" width="45"><a  href="pmatch.asp?league=turkey&amp;stats=240-6-7-2022">stats</a></td>
 <td align="center"><font color="gray" style="font-size:11px;">(1-0)</font></td>
 <td align="center"> </td>
 <td align="center">7</td>
 <td align="center"> </td>
</tr>

My Java Code:

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.sql.Timestamp;
import java.time.LocalDate;
import java.util.ArrayList;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

public class TestScrapper {
    private static final String driver = "com.mysql.cj.jdbc.Driver";
    private static final String url = "jdbc:mysql://localhost:3306/bets";
    private static final String user = "root";
    private static final String pass = "Fener2013";
    private static PreparedStatement ps = null;
    private static Connection conn = null;
    private static Statement st = null;
    private static ResultSet rs = null;
    private static int id = 0;
    private static ArrayList<String> teams = new ArrayList<>();
    private static Map<String, Integer[]> statistics = new HashMap<>();
    public static void main(String[] args) {
    
final String URL = "https://www.soccerstats.com/results.asp?league=turkey&pmtype=month2"; 

        try {
            final Document document = Jsoup.connect(URL).get();
               System.out.println(document.select("tr:nth-child(n 1).odd"));
             // System.out.println(document.outerHtml());
            id = 0;

            for (Element table : document.select("tr:nth-last-child(-n 4).odd")) {
                for (Element td : table.select("tr:nth-child(n-1).odd")) {
                    if (td.select("tr:nth-child(n-1).odd").text().equals("")) {
                        continue;
                    }
                    final String loc = td.select("td:nth-last-child(-n 4).odd").text();
                    final String vis = td.select("td").text();
                    final String res = td.select("odd").text();
                    id  ;

                      System.out.println(loc);
                }
            }

        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }

CodePudding user response:

I'm no expert with JSOUP, but as I mentioned in the comments, you're mistaking the CSS selectors. Your could try something like the following instead your for loops:

...
for (Element tr : document.select("tr")) {
    for (Element td : tr.select("td")) {
        System.out.print(td.text()   " ");
    }
    System.out.println();
}
...

This should print the contents of all table cells separated by spaces, one row per line.

Then you'll have to adjust the CSS selectors in order to get exactly what you want. For example, you could get the first and second column of each row with someting like this:

...
for (Element tr : document.select("tr")) {
    String first = tr.select("td:first-child").text();
    String second = tr.select("td:nth-child(2)").text();
    System.out.println(first   " | "   second);
}
...

Take into account that the select method of an Element item, affects only to the contents of that item.

CodePudding user response:

Alfredo thanks for your effort. I am also not expert with JSOUP too. I tried to modified my code as you mentioned but I got all data in site still. I need to assign to strings to those variables but not.I can't catch from the table columns as date, homeTeam, score, awayTeam and halftimeScore properly. The problem is site has no div and not using any class in tr or td areas for identify to text easily. I think there is a solution for that but I cannot find.

My Final Code is:

            for (Element tr : document.select("td")) {
            for (Element td : tr.select("td")) {
                if (td.select("td :nth-child(2)").text().equals("")) {
                    continue;
                }
                final String date = td.select("td:first-child").text();
                final String homeTeam = tr.select("td:nth-child(1)").text();
                final String score = tr.select("td:nth-child(2)").text();
                final String awayTeam = tr.select("td:nth-child(3)").text();
                final String halftimeScore = tr.select("td:nth-child(4)").text();

                id  ;
                 System.out.println(date   " | "   homeTeam   " | "   score   " | "   awayTeam    " | "   halftimeScore);
                // System.out.println(td.text()   " ");
            }
        }
  • Related