I am currently trying to return a substring if is present in a string, with case insensitive.
So an example would be, I want to return the string "apple" even when the sentence is "Apple is cool" or "I like APPLE" or "I like apples"
What I have so far is this:
df_word_list = pd.DataFrame({'word': ['apple','cool']})
df= pd.DataFrame({'sentence': ['"Apple is cool","I like APPLE","I like apples"]})
words = [x for x in df_word_list['word'].tolist() if x in str(df['sentence'][i])]
This gives me the returned words, but it's case sensitive, anyone knows how to turn it into case insensitive?
I would like the final output to be
- apple, cool
- apple
Row 3 is empty because it has an "s" ("apples" instead of "apple")
df_words_list is the dataframe of words that I want to identify. df is the dataframe that contains the sentences.
CodePudding user response:
df.sentence.str.lower().str.split().apply(lambda l: ", ".join([x for x in l if x in df_word_list["word"].values]))
result is pandas.Series
of strings
0 apple, cool
1 apple
2
Name: sentence, dtype: object