Home > Blockchain >  extract element in jsoup in first level, no recursive in table
extract element in jsoup in first level, no recursive in table

Time:02-17

i need to display the principal row of this table, with another table nestint

<html><body>
<div id = div1><table><tbody>
<tr><td>Steve</td>
<td><table><tbody><tr><td>Steve2</td></tr></tbody></table>"
</tr></tbody></table></body></html>

The rows can be more than once. I want to extract then content of the tr at the first level (not <tr><td>Steve2</td></tr>).

This is the code:

String html = "<html><body>"
  "<div id = div1><table><tbody>"
  "<tr><td>Steve</td>"
  "<td><table><tbody><tr><td>Steve2</td></tr></tbody></table>"
  "</tr></tbody></table></body></html>";
doc = Jsoup.parse(html);
Elements elemHtml = doc.select("div#div1>table");
for(Element elem1:elemHtml) {
    Elements elem2 = elem1.select("tr");
    for(Element elem3:elem2) {
        System.out.println("Content: " elem3);
        System.out.println("----------");
    }
}

I tried to add <div> tag inside the table but the parse doesn't work.

CodePudding user response:

Change your css selector to div#div1>table>tboby>tr to map only the <tr> that are directly under your <tobdy> element, that's what > means in css

CodePudding user response:

I've made some more complex html, to show that the solution works for a more general case than the one in the question:

<html> <body> <div id = div1> <table> <tbody>
<tr> <td>Steve1</td> <td> <table> <tbody> <tr>
<td>Steve2a</td> </tr> <tr> <td>Steve2b</td>
</tr> </tbody> </table> </tr> <tr> <td>Steve3</td>
<td> <table> <tbody> <tr> <td>Steve4</td> </tr>
</tbody> </table> </tr> </tbody> </table>
</body> </html>

which results in the following table:

html table

Use the following selector to get all the table's rows - div#div1>table> tbody > tr
and then iterate over these rows to get the first row - select("td").first().
Full code -

Document doc = null;
String html2 = "<html> <body> <div id = div1> <table> <tbody>"  
    "<tr> <td>Steve1</td> <td> <table> <tbody> <tr>"  
    "<td>Steve2a</td> </tr> <tr> <td>Steve2b</td>"  
    "</tr> </tbody> </table> </tr> <tr> <td>Steve3</td>"  
    "<td> <table> <tbody> <tr> <td>Steve4</td> </tr>"  
    "</tbody> </table> </tr> </tbody> </table>"  
    "</body> </html>";

doc = Jsoup.parse(html2);
Elements outerRows = doc.select("div#div1>table> tbody > tr");
for(Element row : outerRows) {
    Element data = row.select("td").first();
    System.out.println(data);
    System.out.println("------------");
}

If you want only the text (SteveX) than you can get it with the text method:
System.out.println(data.text());

  • Related