Home > OS >  Unescape XML tags only. Keep content escaped
Unescape XML tags only. Keep content escaped

Time:10-15

I have to consume a WS that sends its XML data inside a CDATA tag, the output I get is the following:

<parent>
    <child1>
        <xmltag1>4 años < 8 </xmltag1>
        <xmltag2>3 años < 12 </xmltag2>
    <child1>
</parent>

I have to format this data to an usable XML so I can work with it.

It should look like:

<parent>
    <child1>
        <xmltag1>4 años &lt; 8 </xmltag1>
        <xmltag2>3 años &lt; 12 </xmltag2>
    <child>
</parent>

I have tried various java functions like: StringEscapeUtils.unescapeXml(string);

I guess there could be a way of getting that result by using regex

string.replaceAll("&lt;{0}>", "</{0}>");

CodePudding user response:

You can use

String fixedXml = text.replaceAll("&lt;(/?\\w (?:\\s[^>]*)?>)", "<$1");

See the regex demo. Details:

  • &lt; - a &lt; string
  • (/?\\w (?:\\s[^>]*)?>) - Group 1 ($1):
    • /? - an optional / char
    • \w - one or more word chars
    • (?:\s[^>]*)? - an optional sequence of a whitespace char and then any zero or more chars other than >
    • > - a > char.
  • Related