Home > Mobile >  How to check if a noun exists in a text file with C#?
How to check if a noun exists in a text file with C#?

Time:05-18

I have tried to find a library in C# that can help me do parts of speech tagging to recognize a noun from a sentence but with no avail. So I decided to check my string's words against a list of nouns in a text file I downloaded. My code assumes the first Noun in a sentence is the noun referred to by the verb, I assumed this because most comments left on a website do not compose of many words. So here is my attempt at splitting the text to an array and then looping through each noun in the noun.txt file and see if my string exists. My code is below, I just want to return The first noun found or No noun detected as the return type of my methods.

string DetectNoun(string param){
  //split the input parameter into words based on spaces
  string[] words=param.ToLower().Split(" ");
  //read all the nouns in the text file into an array:NB all nouns are in lower case
  string[] allNouns=File.ReadAllLines("Nouns.txt");
  //loop through each noun in the array and check if any exists in our input parameter
  int j=0;
  for(int i=0;i>allNouns.Length;i  ){
   if(allNouns[i]==words[j  ]){
      //return this word as the noun found
       return allNouns[i];
     }
  }
  //if no match was found return no noun detected
  return "No noun detected";
}

Tested that above with a sample input of Samsung Television, No manual, Box included. It keeps returning No noun was detected despite television being in the text file of Nouns I just downloaded.

CodePudding user response:

Your original code has a couple problems:

  1. You are splitting by space, so in your case, you get (for example), television, with a comma in your word list. When comparing, you are comparing television with television, so it doesn't match.

  2. You are using == for a comparison, which is "confusing" when comparing strings, you should rather use a correct string comparer.

  3. Your for loop is broken, you are only comparing index to index (first word with first noun, second word with second noun, etc.)... also, if your noun list is smaller than your word list, this will raise an IndexOutOfRange exception

To solve all of these:

  1. Remove all characters from your string that you don't want, prior to splitting. I'd recommend a regex (Regex.Replace(input, @"[^a-zA-Z\d ]", ""), but you'll need to check whether that suits your input (specially if receiving non a-Z alphanumeric characters, like accents, or umlauts, etc.)

  2. Use StringComparer.OrdinalIgnoreCase or StringComparison.OrdinalIgnoreCase instead of going ToLower(). ToLower() is usually not the right way to do it specially when doing cultures other than English

  3. I'd use Linq and make a one-liner out of it:

words.FirstOrDefault(x => allNouns.Contains(x, StringComparer.OrdinalIgnoreCase);

That'd return null if the no word in words is contained in allNouns, otherwise, the first match.

Putting it all together:

string DetectNoun(string param){
  string[] words = Regex.Replace(param, @"[^a-zA-Z\d ]", "").Split(' ');
  // You should cache this somewhere if you plan to call this many times,
  // but I'll leave that up to you
  string[] allNouns=File.ReadAllLines("Nouns.txt");
  return words.FirstOrDefault(x => allNouns.Contains(x, StringComparer.OrdinalIgnoreCase)) 
            ?? "No noun detected";
}
  • Related