I have this string, I want to keep only the tags <ul>
and <li>
and the content within those tags. Is there a regexp that would do that?
input:
<div style="text-align: justify;">
<font face="Century Gothic, sans-serif">
<div style="">
<ul>
<li><span style="font-size: 13.3333px;">blahblahblah</span></li>
<li><span style="font-size: 13.3333px;">blahblahblah</span></li>
<li><span style="font-size: 13.3333px;">blahblahblah</span></li>
</ul>
</div>
<div style="font-size: 13.3333px;"><br></div></font>
</div>
<p></p>
expected output :
<ul>
<li>blahblahblah</li>
<li>blahblahblah</li>
<li>blahblahblah</li>
</ul>
CodePudding user response:
Using Jsoup something like below should work:
Document doc = Jsoup.parse(yourHtml);
Element e = doc.selectFirst("ul");
e.children().forEach(li -> {
li.text(li.text());
});
System.out.println(e);