Home > Software design >  JS - How to check if a string contains an exact phrase value in array
JS - How to check if a string contains an exact phrase value in array

Time:07-12

I am working on a script using Twitter's API and I am trying to find matches to exact phrases.

The API however doesn't allow queries for exact phrases so I've been trying to find a workaround however I am getting results that contain words from the phrases but not in the exact match as the phrase.

var search_terms = "buy now, look at this meme, how's the weather?"; 

let termSplit = search_terms.toLowerCase();    
let termArray = termSplit.split(', ');
//["buy now", "look at this meme", "how's the weather?"];
    
    client.stream('statuses/filter', { track: search_terms }, function (stream) {
    console.log("Searching for tweets...");
       stream.on('data', function (tweet) { 
        if(termArray.some(v => tweet.text.toLowerCase().includes(v.toLowerCase()) )){
          //if(tweet.text.indexOf(termArray) > 0 )
            console.log(tweet);
        }
      });
    });

Expected results should be a tweet with any text as long as it contains the exact phrase somewhere.

The results I am getting returns tweets that have an array value present but not an exact phrase match of the value.

Example results being returned - "I don't know why now my question has a close request but I don't buy it."

Example results I am expecting - "If you like it then buy now."

What am I doing wrong?

CodePudding user response:

You could try using regular expressions. Here's an example of a regular expression search for a phrase. It returns a positive number (the character where the match started) if there is a match, and -1 otherwise. I return the whole phrase if there is a match.

You can use quite sophisticated grammar's for matching particular phrases of interest, I'm just using simple words in this example.

regular_expression

CodePudding user response:

First, toward the future:

Twitter is planning to deprecate the statuses/filter v1.1 endpoint:

These features will be retired in six months on October 29, 2022.

Additionally, beginning today, new client applications will not be able to gain access to v1.1 statuses/sample and v1.1 statuses/filter. Developers with client apps already using these endpoints will maintain access until the functionality is retired. We are not retiring v1.1 statuses/filter in 6-months, only the ability to retrieve compliance messages. We will retire the full endpoint eventually.

So, now is a great time to start using the equivalent v2 API, Filtered Stream, which supports exact phrase matching, helping you avoid this entire scenario in your application code.


With that out of the way, below I've included a minimal, reproducible example for you to consider which demonstrates how to match exact phrases in streamed tweets, and even extract additional useful information (like which phrase was used to match it and at what index within the tweet text). It includes inline comments explaining things line-by-line:

<script type="module">

// Transform to lowercase, split on commas, and trim whitespace
// on the ends of each phrase, removing empty phrases
function getPhrasesFromTrackText (trackText) {
  return trackText.toLowerCase().split(',')
    .map(str => str.trim())
    .filter(Boolean);
}

const trackText = `buy now, look at this meme, how's the weather?`;
const phrases = getPhrasesFromTrackText(trackText);

// The callback closure which will be invoked with each matching tweet
// from the streaming response data
const handleTweet = (tweet) => {
  // Transform the tweet text once
  const lowerCaseText = tweet.text.toLowerCase();

  // Create a variable to store the first matching phrase that is found
  let firstMatchingPhrase;
  for (const phrase of phrases) {
    // Find the index of the phrase in the tweet text
    const index = lowerCaseText.indexOf(phrase);
    // If the phrase isn't found, immediately continue
    // to the next loop iteration, skipping the rest of the code block
    if (index === -1) continue;
    // Else, set the match variable
    firstMatchingPhrase = {
      index,
      text: phrase,
    };
    // And stop iterating the other phrases by breaking out of the loop
    break;
  }

  if (firstMatchingPhrase) {
    // There was a match; do something with the tweet and/or phrase
    console.log({
      firstMatchingPhrase,
      tweet,
    });
  }
};

// The Stack Overflow code snippet runs in a browser and doesn't have access to
// the Node.js Twitter "client" in your question,
// but you'd use the function like this:

// client.stream('statuses/filter', {track: trackText}, function (stream) {
//   console.log('Searching for tweets...');
//   stream.on('data', handleTweet);
// });

// Instead, the function can be demonstrated by simulating the stream: iterating
// over sample tweets. The tweets with a ✅ are the ones which
// will be matched in the function and be logged to the console:

const sampleTweets = [
  /* ❌ */ {text: `Now available: Buy this product!`},
  /* ✅ */ {text: `This product is availble. Buy now!`},
  /* ✅ */ {text: `look at this meme            
  • Related