I have a list of tweet that contains hundreds of tweets. I want to replace the nonstandard words in the list of tweets with their standard equivalents by looping through the tweets.
I used this code to read the file containing the standard words
dk = open('standard_word.txt','r')
dlist = []
for x in dk.readlines():
dlist.append(x.replace('\n',''))
dlist
then I use this code to print words that are not in the list
for x in tweets:
if x[0] not in dlist:
print(x[0],x[1],x[2],x[3],x[4],x[5])
but it is limited to printing only the first five strings, I'm looking for a way to print all of the strings without limititations (flexible to the numbers of strings in each tweets). Thank you for your help:)
CodePudding user response:
I am not sure I understood you correctly, but did you mean something like this:
standard_words = ['this', 'is', 'just', 'a', 'random', 'tweet']
tweets = ['This is jst a random tweet']
for tweet in tweets:
for word in tweet.split():
if word.lower() not in standard_words:
print(word)
CodePudding user response:
If you need to replace non-standard words, you can do it using an equivalence dictionary...
standard_words = ['this', 'is', 'just', 'a', 'random', 'tweet']
equivalences = {"jst": "just"}
tweets = ['This is jst a random tweet']
for idx, tweet in enumerate(tweets):
for word in tweet.split():
if word.lower() not in standard_words:
if word.lower() in equivalences:
tweet = tweet.replace(word, equivalences[word])
tweets[idx] = tweet
print(tweets)