Home > Software engineering >  Regex Filtering through html to find a word but not inside of a table header
Regex Filtering through html to find a word but not inside of a table header

Time:12-14

So I'm trying to alter a regex function that I have that currently searches for a word and returns it when found. Unfortunately, the same word may sometimes exist in a table header (<th> tag) and in this case I don't want to match.

This is for JavaScript Regex: (lastCoveoSearch is the variable for the word being passed in)

the original regex function I had that found words even in the table header:

new RegExp('>[^<]*(' lastCoveoSearch ')', "ig")

one I'm testing that seems to be working on a regex tested but not with my code in which ignores table header tags and the attributes within them:

new RegExp('<(?!th)\b[^>]*>[^<]*(' lastCoveoSearch ')', "ig")

Is there somewhere I'm going wrong here? I have attached an html snippet for an example of a page I'm testing for the function to work on.

<div >
    <div >
<h1 >Wet vs. Dry Funding States</h1>    </div>
</div>
<div >
    <div >
<div ><div >
<div >
<h2></h2>
</div>
</div>
<table  style="text-align: center;">
    <thead>
        <tr >
            <th colspan="2">This is the table header that I do not want to be matched<br />
            </th>
        </tr>
    </thead>
    <tbody>
        <tr >
            <td><span style="text-align: center;">Alabama</span></td>
            <td>&nbsp;<span style="text-align: center;">Nebraska</span></td>
        </tr>
        <tr >
            <td>&nbsp;<span style="text-align: center;">Arkansas</span></td>
            <td><span style="text-align: center;">New Hampshire</span>&nbsp;</td>
        </tr>
        <tr >
            <td>&nbsp;<span style="text-align: center;">Colorado</span></td>
            <td><span style="text-align: center;">New Jersey</span>&nbsp;</td>
        </tr>

CodePudding user response:

Not sure but this seems to work. It's pretty much similar to yours, only need to escape the \ there when using the new RegExp('', '') syntax.

var html = `
  <th>hi 13 hello</th>
  <td>good hello 12</td>
`;


var word = "hello"
var reg = new RegExp('<(?!th)\\b.*>[^<]*('   word   ')', "ig");
console.log(reg)

var matches = reg.exec(html)
console.log(matches)

CodePudding user response:

There is no need for a regular expression here. Just select the <td> elements and match their innerText to the query string.

const results = document.querySelector('#results');

const doesMatch = (text, query) =>
  query.length > 0 && text.toLowerCase().includes(query);

const highlight = (td, query) => {
  td.classList.toggle('highlight', doesMatch(td.innerText, query));
};

const handleSearch = (e) => {
  const query = e.target.value.trim().toLowerCase();
  results.querySelectorAll('td').forEach(td => highlight(td, query));
};

document.querySelector('#query').addEventListener('input', handleSearch);
.highlight { background: yellow; }
<label>Search</label>
<input id="query" type="search" placeholder="Search for a word..." autocomplete="off"/>
<hr />
<table id="results">
  <thead>
    <tr>
      <th colspan="2">List of States</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Alabama</td>
      <td>Montana</td>
    </tr>
    <tr>
      <td>Alaska</td>
      <td>Nebraska</td>
    </tr>
    <tr>
      <td>Arizona</td>
      <td>Nevada</td>
    </tr>
    <tr>
      <td>Arkansas</td>
      <td>New Hampshire</td>
    </tr>
    <tr>
      <td>California</td>
      <td>New Jersey</td>
    </tr>
    <tr>
      <td>Colorado</td>
      <td>New Mexico</td>
    </tr>
    <tr>
      <td>Connecticut</td>
      <td>New York</td>
    </tr>
    <tr>
      <td>Delaware</td>
      <td>North Carolina</td>
    </tr>
    <tr>
      <td>Florida</td>
      <td>North Dakota</td>
    </tr>
    <tr>
      <td>Georgia</td>
      <td>Ohio</td>
    </tr>
    <tr>
      <td>Hawaii</td>
      <td>Oklahoma</td>
    </tr>
    <tr>
      <td>Idaho</td>
      <td>Oregon</td>
    </tr>
    <tr>
      <td>Illinois</td>
      <td>Pennsylvania</td>
    </tr>
    <tr>
      <td>Indiana</td>
      <td>Rhode Island</td>
    </tr>
    <tr>
      <td>Iowa</td>
      <td>South Carolina</td>
    </tr>
    <tr>
      <td>Kansas</td>
      <td>South Dakota</td>
    </tr>
    <tr>
      <td>Kentucky</td>
      <td>Tennessee</td>
    </tr>
    <tr>
      <td>Louisiana</td>
      <td>Texas</td>
    </tr>
    <tr>
      <td>Maine</td>
      <td>Utah</td>
    </tr>
    <tr>
      <td>Maryland</td>
      <td>Vermont</td>
    </tr>
    <tr>
      <td>Massachusetts</td>
      <td>Virginia</td>
    </tr>
    <tr>
      <td>Michigan</td>
      <td>Washington</td>
    </tr>
    <tr>
      <td>Minnesota</td>
      <td>West Virginia</td>
    </tr>
    <tr>
      <td>Mississippi</td>
      <td>Wisconsin</td>
    </tr>
    <tr>
      <td>Missouri</td>
      <td>Wyoming</td>
    </tr>
  </tbody>
</table>

  • Related