I'm trying to have the user input a string of characters with one asterisk. The asterisk indicates a character that can be subbed out for a vowel (a,e,i,o,u) in order to see what substitutions produce valid words. Essentially, I want to take an input "l*g" and have it return "lag, leg, log, lug" because "lig" is not a valid English word. Below I have invalid words to be represented as "x".
I've gotten it to properly output each possible combination (e.g., including "lig"), but once I try to compare these words with the text file I'm referencing (for the list of valid words), it'll only return 5 lines of x's. I'm guessing it's that I'm improperly importing or reading the file?
Here's the link to the file I'm looking at so you can see the formatting: https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/words.zip Using the "en" file ~2.5MB
It's not in a dictionary layout i.e. no corresponding keys/values, just lines (maybe I could use the line number as the index, but I don't know how to do that). What can I change to check the test words to narrow down which are valid words based on the text file?
with open(os.path.expanduser('~/Downloads/words/en')) as f:
words = f.readlines()
inputted_word = input("Enter a word with ' * ' as the missing letter: ")
letters = []
for l in inputted_word:
letters.append(l)
### find the index of the blank
asterisk = inputted_word.index('*') # also used a redundant int(), works fine
### sub in vowels
vowels = ['a','e','i','o','u']
list_of_new_words = []
for v in vowels:
letters[asterisk] = v
new_word = ''.join(letters)
list_of_new_words.append(new_word)
for w in list_of_new_words:
if w in words:
print(new_word)
else:
print('x')
There are probably more efficient ways to do this, but I'm brand new to this. The last two for loops could probably be combined but debugging it was tougher that way.
CodePudding user response:
print(list_of_new_words)
gives
['lag', 'leg', 'lig', 'log', 'lug']
So far, so good.
But this :
for w in list_of_new_words:
if w in words:
print(new_word)
else:
print('x')
Here you print new_word
, which is defined in the previous for
loop :
for v in vowels:
letters[asterisk] = v
new_word = ''.join(letters) # <----
list_of_new_words.append(new_word)
So after the loop, new_word
still has the last value it was assigned to : "lug"
(if the script input was l*g
).
You probably meant w
instead ?
for w in list_of_new_words:
if w in words:
print(w)
else:
print('x')
But it still print
s 5 x
s ...
So that means that w in words
is always False
. How is that ?
Looking at words
:
print(words[0:10]) # the first 10 will suffice
['A\n', 'a\n', 'aa\n', 'aal\n', 'aalii\n', 'aam\n', 'Aani\n', 'aardvark\n', 'aardwolf\n', 'Aaron\n']
All the words from the dictionnary contain a newline character (\n
) at the end. I guess you were not aware that it is what readlines
do. So I recommend using :
words = f.read().splitlines()
instead.
With these 2 modifications (w
and splitlines
) :
Enter a word with ' * ' as the missing letter: l*g
lag
leg
x
log
lug