Result should be: a list of words with a length bigger than 9 , words should be lower and no punctuation in words, ***only three lines of code in the body of the function. The problem in my code is that is still adding punctuation to my word. I tried with checking just for one exmp. if ch not in one of those ->('-' or '"' or '!') or with r'[.,"!-]'.
I also tried to open the file not using with and it worked, i got the result that i want but using this method i am not gonna respect the part with only 3 lines of code inside body function
import string
min_length = 9
with open('my_file.txt') as file:
content = ''.join([ch for ch in file if ch not in string.punctuation])
result = [word.lower() for word in content.split() if len(word)>min_length]
print(result)
'''my output:
['distinctly', 'repeating,', 'entreating', 'entreating', 'hesitating', 'forgiveness', 'wondering,', 'whispered,', '"lenore!"-', 'countenance', '"nevermore."', 'sculptured', '"nevermore."', 'fluttered-', '"nevermore."', '"doubtless,"', 'unmerciful', 'melancholy', 'nevermore\'."', '"nevermore."', 'expressing', 'nevermore!', '"nevermore."', '"prophet!"', 'undaunted,', 'enchanted-', '"nevermore."', '"prophet!"', '"nevermore."', 'upstarting-', 'loneliness', 'unbroken!-', '"nevermore."', 'nevermore!']
as you can see there are still words with punctuation
CodePudding user response:
I got this.
from string import punctuation
with open('test.txt') as f:
data = f.read().replace('\n','')
for a in punctuation:
data = data.replace(a,'')
data = list(set([a for a in data.split(' ') if len(a)>9]))
print(data)
output:
There is an empty list because in the given data there not a single word which has more than 9 letters.
CodePudding user response:
I believe this could be an appropriate solution:
from string import punctuation
with open('files/text.txt') as f:
print(set([a for a in f.read().translate(''.maketrans('', '', ''.join([ p for p in punctuation ]) '\n')).split(' ') if len(a)>9]))
However this is a crime against humanity in terms of readability and I would highly suggest you relax this three line requirement to allow your code to be more understandable in the long run.