First off, I am new to regex and am using https://regex101.com/r/arkkVE/3 to help me learn it.
I'd like to find words from a .txt file that I have using re. So far I am able to do this, but it is very verbose and I am trying to cut back on repeated sequences of regex expressions.
currently this is what I have
Possibility = list()
with open('5LetterWords.txt') as f:
for line in f.readlines():
Possibility = re.findall(r'(?=\w)(?=.*[@#t])[\w] (?=\w)(?=.*[@#o])[\w] (?=\w)(?=.*[@#u])[\w] '
, line)
print(Possibility)
This finds words that have the letters "t" and "o" and "u" in no particular order, which is the first step in what I want.
I want to add additional regex expressions that will omit words that have other characters, but I don't know how to exclude using regex.
As you can see this is starting to get really long and ugly.
Should I be using regex? Is there a better/more concise way to solve this problem?
Thanks
CodePudding user response:
I guess you could iterate through your list of words and filter out which word you want or don't want, for example
words = ['about', 'alout', 'aotus', 'apout', 'artou', 'atour', 'blout', 'bottu', 'bouet', 'boult', 'bouto', 'bouts', 'chout', 'clout', 'count', 'court', 'couth', 'crout', 'donut', 'doubt', 'flout', 'fotui', 'fount', 'foute', 'fouth', 'fouty', 'glout', 'gouty', 'gouts', 'grout', 'hoult', 'yourt', 'youth', 'joust', 'keout', 'knout', 'lotus', 'louty', 'louts', 'montu', 'moult', 'mount', 'mouth', 'nobut', 'notum', 'notus', 'plout', 'pluto', 'potus', 'poult', 'pouty', 'pouts', 'roust', 'route', 'routh', 'routs', 'scout', 'shout', 'skout', 'smout', 'snout', 'south', 'spout', 'stoun', 'stoup', 'stour', 'stout', 'tatou', 'taupo', 'thous', 'throu', 'thuoc', 'todus', 'tofus', 'togue', 'tolus', 'tonus', 'topau', 'toque', 'torus', 'totum', 'touch', 'tough', 'tould', 'tourn', 'tours', 'tourt', 'touse', 'tousy', 'toust', 'touts', 'troue', 'trout', 'trouv', 'tsubo', 'voust']
result = []
for word in words:
if ('a' in word) or ('y' in word):
continue #to skip
elif ('t' in word) or ('u' in word) or ('o' in word):
result.append(word)
CodePudding user response:
Ideally you would read the file line by line and check each word for the existence of t
, o
, and u
and additionally check that a
does not exist.
I'm not a Python dev but this seems relevant: https://stackoverflow.com/a/5189069/2191572
if ('t' in word) and ('o' in word) and ('u' in word) and ('a' not in word):
print('yay')
else:
print('nay')
If you insist on regex, then this would work:
^(?=.*t)(?=.*o)(?=.*u)(?!.*a).*$
^
- start line anchor(?=.*t)
- ahead of me there exists at
(?=.*o)
- ahead of me there exists ao
(?=.*u)
- ahead of me there exists au
(?!.*a)
- ahead of me are noa
s.*
- capture everything$
- end line anchor
Note: (?!.*a).*
can be substituted with [^a]*