Home > Software design >  How to ignore tag inside RegExp
How to ignore tag inside RegExp

Time:02-18

In my project, we use a RegExp to display the title of cards that we receive from the deck. And recently I found that from the deck side we sometimes receive different formats and the titles didn't display.

So, before it was always a string like this:

const res =
  `<p  style="font-size: 27.2px">
    <span>Some text here</span><span> - Title</span>
   </p>`;

and the RegExp was:

/<p [^>]*>[\s]*<span[^>]*>(. )<\/span><span[^>]*>(. )<\/span><div[^>]*>(. )<\/div> [\s]*<\/p>/i.exec(res);

Now sometimes we receive res with div and <br> tags inside

const res = 
  `<p  style="font-size: 27.2px">
    <span>Some text here</span><span> - Title</span>
    <div style="font-size: 10px">Title:<br>Some text here</div>
   </p>`;

The question is, how to change the RegEx to ignore this <div>..<br>.</div>?

Here's a demo:

const res =
  `<p  style="font-size: 27.2px">
    <span>Some text here</span><span> - Title</span>
   </p>`;
   
const newRes =
  `<p  style="font-size: 27.2px">
    <span>Some text here</span><span> - Title</span>
    <div style="font-size: 10px">Title:<br>Some text here</div>
   </p>`;
   
const regEx = /<p [^>]*>[\s]*<span[^>]*>(. )<\/span><span[^>]*>(. )<\/span> [\s]*<\/p>/i;
   
const correct = regEx.exec(res);
const broken = regEx.exec(newRes);
 
console.log('correct', correct);
console.log('broken', broken);

Would be really grateful for any help!

CodePudding user response:

Parse the htmlString into the DOM, then extract the text.

const res =
  `<p  style="font-size: 27.2px">
    <span>Some text here</span><span> - Title</span>
    <div style="font-size: 10px">Title:<br>Some text here</div>
   </p>`;
const getNodes = str => {
  document.body.insertAdjacentHTML('beforeEnd', str);
  const DOM = document.querySelector('.cardTitle');
  return DOM.innerText;
};

console.log(getNodes(res));

CodePudding user response:

Simplify the regex

/<p [^>]*>\s*<span[^>]*>(.*?)<\/span><span[^>]*>(.*?)<\/span>.*?<\/p>/si

This will get the p tag, with the 2 spans and whatever else it contains.

const res =
  `<p  style="font-size: 27.2px">
    <span>Some text here</span><span> - Title</span>
   </p>`;
   
const newRes =
  `<p  style="font-size: 27.2px">
    <span>Some text here</span><span> - Title</span>
    <div style="font-size: 10px">Title:<br>Some text here</div>
   </p>`;
   
const regEx = /<p [^>]*>\s*<span[^>]*>(.*?)<\/span><span[^>]*>(.*?)<\/span>.*?<\/p>/si;
   
const correct = regEx.exec(res);
const broken = regEx.exec(newRes);
 
console.log('correct', correct);
console.log('broken', broken);

  • Related