Home > front end >  Finding the singular or plural form of a word with regex
Finding the singular or plural form of a word with regex

Time:01-13

Let's assume I have the sentence:

sentence = "A cow runs on the grass"

If I want to replace the word cow with "some" special token, I can do:

to_replace = "cow"
# A <SPECIAL> runs on the grass
sentence = re.sub(rf"(?!\B\w)({re.escape(to_replace)})(?<!\w\B)", "<SPECIAL>", sentence, count=1)

Additionally, if I want to replace it's plural form, I could do:

sentence = "The cows run on the grass"
to_replace = "cow"
# Influenza is one of the respiratory <SPECIAL>
sentence = re.sub(rf"(?!\B\w)({re.escape(to_replace)   's?'})(?<!\w\B)", "<SPECIAL>", sentence, count=1)

which does the replacement even if the word to replace remains in its singular form cow, while the s? does the job to perform the replacement.

My question is what happens if I want to apply the same in a more general way, i.e., find-and-replace words which can be singular, plural - ending with s, and also plural - ending with es (note that I'm intentionally ignoring many edge cases that could appear - discussed in the comments of the question). Another way to frame the question would be how can add multiple optional ending suffixes to a word, so that it works for the following examples:

to_replace = "cow"
sentence1 = "The cow runs on the grass"
sentence2 = "The cows run on the grass"
# --------------
to_replace = "gas"
sentence3 = "There are many natural gases"

CodePudding user response:

I suggest using regular python logic, remember to avoid stretching regexes too much if you don't need to:

phrase = "There are many cows in the field cowes"
for word in phrase.split():
    if word == "cow" or word == "cow"   "s" or word == "cow"   "es":
        phrase = phrase.replace(word, "replacement")
print(phrase)

Output:

There are many replacement in the field replacement

CodePudding user response:

Apparently, for the use-case I posted, I can make the suffix optional. So it could go as:

re.sub(rf"(?!\B\w)({re.escape(e_obj)   '(s|es)?'})(?<!\w\B)", "<SPECIAL>", sentence, count=1)

Note that this would not work for many edge cases discussed in the comments!

  • Related