Home > database >  trying to get the most used word using regex
trying to get the most used word using regex

Time:07-18

I am trying to get 10 most frequent word in the sentence below, I need to use regular expression.
let paragraph = `I love teaching. If you do not love teaching what else can you love. I love Python if you do not love something which can give you all the capabilities to develop an application what else can you love.

I want an output like this

    {word:'love', count:6},
    {word:'you', count:5},
    {word:'can', count:3},
    {word:'what', count:2},
    {word:'teaching', count:2},
    {word:'not', count:2},
    {word:'else', count:2},
    {word:'do', count:2},
    {word:'I', count:2},
    {word:'which', count:1},
    {word:'to', count:1},
    {word:'the', count:1},
    {word:'something', count:1},
    {word:'if', count:1},
    {word:'give', count:1},
    {word:'develop',count:1},
    {word:'capabilities',count:1},
    {word:'application', count:1},
    {word:'an',count:1},
    {word:'all',count:1},
    {word:'Python',count:1},
    {word:'If',count:1}]```


    

CodePudding user response:

This is a solution without regexp, but maybe it is also worth looking at?

const paragraph = `I love teaching. If you do not love teaching what else can you love. I love Python if you do not love something which can give you all the capabilities to develop an application what else can you love.`;

let res=Object.entries(
  paragraph.toLowerCase()
           .split(/[ .,;-] /)
           .reduce((a,c)=>(a[c]=(a[c]||0) 1,a), {})
 ).map(([k,v])=>({word:k,count:v})).sort((a,b)=>b.count-a.count)

console.log(res.slice(0,10)) // only get the 10 most frequent words

CodePudding user response:

I have something a bit messy but it uses regex and displays top 10 of the highest occuring results which is what you asked for. Test it and let me know if it works for you.

let paragraph = "I love teaching. If you do not love teaching what else can you love. I love Python if you do not love something which can give you all the capabilities to develop an application what else can you love.";

//remove periods, because teaching and teaching. will appear as different results set
paragraph = paragraph.split(".").join("");

//results array where results will be stored
var results = []

//separate each string from the paragraph
paragraph.split(" ").forEach((word) => {
  const wordCount = paragraph.match(new RegExp(word,"g")).length
  //concatenate the word to its occurence:: e.g I:3 ::meaning I has appeared 3 times
  const res = word   " : "   wordCount;
  //check if the word has been added to results
  if(!results.includes(res)){
    //if not, push
    results.push(res)
  }
})

function sortResultsByOccurences(resArray) {
//we use a sort function to sort our results into order: highest occurence to lowest
    resArray.sort(function(a, b) {
    ///\D/g is regex that removes anything that's not a digit, so that we can sort by occurences instead of letters as well
        return(parseInt(b.replace(/\D/g, ""), 10) - 
parseInt(a.replace(/\D/g, ""), 10));
    });
    //10 means we are using a decimal number system
    return(resArray);
}
//reassign results as sorted
results = sortResultsByOccurences(results);
for(let i = 0; i < 10; i  ){//for loop is used to display top 10
  console.log(results[i])
}

CodePudding user response:

To get all words in a sentence use regular expressions: /(\w )(?=\s)/g.

If you use this in your input string then you get all words without the word which end with full-stop(.) i.e don't match the word "love.". paragraph.match(/(\w )(?=(\s|\.|\,|\;|\?))/gi)

So, in this case we have to modify the regex as: /(\w )(?=(\s|\.))/g. enter image description here Similarly, add the other special(,; ...) character which is end with some word.

This is your solution (please add the other special character if it's required).

let paragraph = `I love teaching. If you do not love teaching what else can you love. I love Python if you do not love something which can give you all the capabilities to develop an application what else can you love.`;
let objArr = [];
[...new Set(paragraph.match(/(\w )(?=(\s|\.|\,|\;|\?))/gi))].forEach(ele => {
    objArr.push({
        'word': ele,
        'count': paragraph.match(new RegExp(ele '(?=(\\s|\\.|\\,|\\;|\\?))', 'gi'))?.length
    })
});
objArr.sort((x,y) => y.count - x.count);

enter image description here

  • Related