I have a dataframe, example:
df = [{'id': 1, 'text': 'text contains ok words'}, , {'id':2, 'text':'text contains word apple'}, {'id':3, 'text':'text contains words ok'}]
Example:
keywords = ['apple', 'orange', 'lime']
And I want to check all columns 'text' to check if contains any word from my keywords, if so I want to alter that text column to: 'disconsider this case'
I've tried to tokenize the column but then I'm not able to use the function I created to check, here is the example:
df = pd.DataFrame(df)
def remove_keywords(inpt):
keywords = ['apple', 'orange', 'lime']
if any(x in word for x in keyword):
return 'disconsider this case'
else:
return inpt
df['text'] = df['text'].apply(remove_keywords)
df
df['text'] = df.apply(lambda row: nltk.word_tokenize(row['text']), axis=1)
for word in df['text']:
if 'apple' in df['text']:
return 'disconsider this case'
Any help appreciated. Thanks!!
CodePudding user response:
this worked for me using pandas and a loop
import pandas as pd
keywords=['apple', 'orange', 'lime']
df = pd.DataFrame([{'id': 1, 'text': 'text contains ok words'}, {'id':2, 'text':'text contains word apple'}, {'id':3, 'text':'text contains words ok'}])
print(df)
for i in range(len(df)):
if any(word in df.iat[i,1] for word in keywords):
df.iat[i,1]='discondider in this case'
print(df)