I'm trying to achieve finding singular and plural words count together inside big chunk of text. I got an idea to make it work with regex (not perfect, but in my case it is great).
I want to have regex, that can take piece of string let's say 'piece' and I want to get count of every existing word in that chunk of text that starts with 'piece' can have 2 random characters at the end (maybe just some specific characters would be even better).
So inside this text "I had a piece of cake. There are many pieces left." I would give regex word 'piec' and it will return '2', because there are 2 words that starts with 'piec' and there are 2 random characters at the end. If in sentence would be word 'piecess' it wouldn't count it, because it have 3 random chars after base word 'piec'.
I explained it as well as I could :D I hope you will understand. I didn't find answer how to do 'can have 2 random characters at end'.
Can someone please help me? Thank you very much for all answers
CodePudding user response:
A random character is defined with [A-Za-z]
. If you would like to consider only lower case characters it would be [a-z]
.
To define that it should have at at max two of those characters to add {0,2}
to the end. Hence if we take your piece example it would be piec[A-Za-z]{0,2}
. However to apply the regexp to a sentence you should also define that the word should be surrounded by non-character letters (\W
) so we get \Wpiec[A-Za-z]{0,2}\W
. However the word could be at the beginning (^
) or the end ($
) and then there would be no non-character letter and we get
(\W|^)(RegE[A-Za-z]{0,2})(\W|$)