Home > Enterprise >  Convert the character codes inside CDATA instead of the whole string
Convert the character codes inside CDATA instead of the whole string

Time:01-05

var str = '&amp;<![CDATA[&amp;]]><![CDATA[&amp;]]>&amp;';

In the above string, I just want to convert only &amp; inside the CDATA not the all &amp;.

Expected Output: &amp;<![CDATA[&]]><![CDATA[&]]>&amp;

I tried below regular expression

str.trim().replace(/^(\/\/\s*)?<!\[CDATA\[|(\/&amp;\/\s*)?\]\]>$/g, '&');

But above code is not working as expected. I am not good in regular expressions. I gone through different answers given in Stackoverflow. But, not able to find the better way to achieve the fix. Could you please guide me.

CodePudding user response:

For this particular string you can apply /(?<=CDATA\[)[&a-z;] (?=]])/g

You can use positive lookbehind and lookahead:

  • (?<=CDATA\[) is a positive lookbehind. Searches everything after CDATA[
  • (?=]]) is a positive lookahead. Searches everything before ]]
  • [&a-z;] matches some text containing lowercase letters, & and ;

If I've got your idea correctly, it would be better to use XML parsers to manipulate a document.

Here you can find a sample js code.

regex101.com

CodePudding user response:

If you want to replace any &amp; in CDATA, regardless of what comes before and after (within CDATA):

str.trim().replace(/<!\[CDATA\[.*?\]\]>/g, m => m.replace('&amp;', '&'));

results in

"&amp;<![CDATA[&]]><![CDATA[&]]>&amp;"

This first matches CDATA sections and replaces them with the result of a function, the function replaces all &amp; with &;

Because that function is only applied on CDATA sections, &amp;s outside of CDATA will not be changed.

Example with more characters in CDATA:

var str = '&amp;<![CDATA[Oh look at this: &amp; Haha!]]>&amp;';
str.trim().replace(/<!\[CDATA\[.*?\]\]>/g, m => m.replace('&amp;', '&'));

result:

"&amp;<![CDATA[Oh look at this: & Haha!]]>&amp;

CodePudding user response:

If you have control over the data received it is better to fix the data upstream. If not, you can use nested replaces:

  • outer replace identifies the <![CDATA[...]]>
  • inner replace &amp; inside CDATA

Both use the g flag to replace multiple time.

[
  '&amp;<![CDATA[&amp;]]><![CDATA[&amp;]]>&amp;',
  '&amp;<![CDATA[this &amp; that]]>&amp;'
].forEach(str => {
  let result = str.replace(/<!\[CDATA\[[^\]]*\]\]>/, m => m.replace(/&amp;/g, '&'));
  console.log(result);
});

Output:

&amp;<![CDATA[&]]><![CDATA[&amp;]]>&amp;
&amp;<![CDATA[this & that]]>&amp;
  • Related