Home > Software engineering >  Javascript regex find all exact matches
Javascript regex find all exact matches

Time:11-11

I know this question is answered many times, but I cannot find what I am searching. I have a basic html page that analyses entered text. I have to find out word repeat counts and I have following code:

    var text = document.forms["myForm"]["myText"].value; //entered text

    var strArray2 = text.split(" ");
    
    for (var i = 0; i < strArray2.length; i  ) {

    var regex = new RegExp(strArray2[i], 'g');
            
    var rptArr = text.match(regex);
            
    var node = document.createElement('p');
            
    node.innerHTML = "<label>The word \""   strArray2[i]   "\" is repeated "   rptArr.length   " times.</label>";
            
    document.getElementById('div0').appendChild(node);
} 

Text:

Lorem Ipsum abc Ipsum, bcd

Output:

The word "Lorem" is repeated 1 times

The word "Ipsum" is repeated 2 time //that should be 1 times

But this code fetches words with comma also (i.e. 'Ipsum' and 'Ipsum,'). I need exact 'Ipsum', not with comma. And I need to use match() function to get an array.

Appreciate for help.

CodePudding user response:

strArray2 as of the OP's example code needs to first get/be sanitized before one can proceed with the other operations. Retrieving the "word values" by splitting a string at its whitespaces comes with the limitations like the OP was running into.

Thus an improvement was already to split at word boundaries (/\b/). Of cause one then needs to filter the result array by word only items (/^\w $/). The next provided approach covers the OP's needs with the least possible effort but it has to be noted that it has it's limitations too especially if one does not want to be limited to basic latin only and/or just to English ...

console.log(
  "'Lorem Ipsum abc Ipsum, bcd'.split(/\\b/) ...",
  'Lorem Ipsum abc Ipsum, bcd'
    .split(/\b/)
);
console.log(
  "'Lorem Ipsum abc Ipsum, bcd'.split(/\\b/).filter(item => (/^\\w $/).test(item)) ...",
  'Lorem Ipsum abc Ipsum, bcd'
    .split(/\b/)
    .filter(item => (/^\w $/).test(item))
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

CodePudding user response:

The question lacks some details in order to be answered properly. However, if the comma always appears at the end as per your example, you could get rid of it using [^,] . Here is an example of using [^,] based on your code:

for (var i = 0; i < strArray2.length; i  ) {

    var item = strArray2[i].match(/[^,] /g);

    var regex = new RegExp(item[0], 'g');
            
    var rptArr = text.match(regex);
            
    var node = document.createElement('p');
            
    node.innerHTML = "<label>The word \""   item[0]   "\" is repeated "   rptArr.length   " times.</label>";
            
    document.getElementById('div0').appendChild(node);
} 
  • Related