I'm trying to search through python for words in a list inside the rows of the dataframe to get two new columns showing the words found separated by commas and another column with the count of the words found
This is my string list
string_list = ["never sounded", "she", "was time", "against"]
and this is the df I want obtain
CodePudding user response:
First I separated by the string by word so you only find exact word matches, so if you search for something like the word "a", it doesn't just find every letter "a" in the string
wordsToFind = "beautiful sunny"
stringToSearch = "today will be a beautiful sunny day"
foundStrings = []
stringsToFind = wordsToFind.split()
for s in stringsToFind:
list_stringSeparatedByWord = stringToSearch.lower().split()
if list_stringSeparatedByWord.count(s.lower()) > 0:
foundStrings.append(s)
print (foundStrings)
CodePudding user response:
Extending from Stephan's answer. Here is a declarative pythonic approach.
It sounds like you are trying to find the intersection of words you are looking for and words which exist in the text. You can achieve this using set intersection. https://docs.python.org/3.8/library/stdtypes.html#frozenset.intersection
Code:
text = "today will be a beautiful sunny day"
get_words = "beautiful sunny"
found_words = list(set(text.split(' ')).intersection(set(get_words.split(' '))))
Result:
found_words == ['beautiful', 'sunny']
In order to use this in pandas across multiple rows you can use df.assign. This will create a new column based on operations from current columns. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.assign.html
Code:
get_words = "beautiful sunny"
word_finder_formatter = lambda row: ', '.join(list(set(row['text'].split(' ')).intersection(set(get_words.split(' ')))))
df = df.assign(found_words=word_finder)
Result:
text | found_words
--------------------------------------------------------------
today will be a beautiful sunny day | beautiful, sunny day