Home > OS >  Regex to remove all tags except <ul> and <li> [closed]
Regex to remove all tags except <ul> and <li> [closed]

Time:09-30

I have this string, I want to keep only the tags <ul> and <li> and the content within those tags. Is there a regexp that would do that?

input:

<div style="text-align: justify;">
<font face="Century Gothic, sans-serif">
<div style="">
<ul>
<li><span style="font-size: 13.3333px;">blahblahblah</span></li>
<li><span style="font-size: 13.3333px;">blahblahblah</span></li>
<li><span style="font-size: 13.3333px;">blahblahblah</span></li>
</ul>
</div>
<div style="font-size: 13.3333px;"><br></div></font>
</div>
<p></p>

expected output :

<ul>
<li>blahblahblah</li>
<li>blahblahblah</li>
<li>blahblahblah</li>
</ul>

CodePudding user response:

Using Jsoup something like below should work:

Document doc = Jsoup.parse(yourHtml);
Element e = doc.selectFirst("ul");
e.children().forEach(li -> {
    li.text(li.text());
});

System.out.println(e);
  • Related