I have a dataframe that looks like this:
dataFrame = pd.DataFrame({'Name': [' Compound Mortar ',
' lime plaster ',
'mortar Screed ',
' Gypsum Plaster ',
' Gypsum Plaster 2',
' lime Plaster 233',
'Clay 23',
'Clay plaster Mortar']})
I am using a filter to search for certain words. My approach so far has been this:
dataFrame["Type"] = ""
mask1 = dataFrame["Name"].apply(lambda x: "Mortar".casefold() in "".join(x).casefold())
I would like that if the filtered word is present in the "NAME" column, the searched word is added in the "Type" column. It could happen that more words were to be found. For example, if you used a new filter with the word "Glue". In this case the corresponding row in the column "Type" should contain both keywords found. (A list would be fine)
CodePudding user response:
You can just do str.findall
import re
word = ['Mortar','Clay']
dataFrame['new'] = dataFrame.Name.str.findall('|'.join(word),flags=re.IGNORECASE).map(','.join)
dataFrame
Out[776]:
Name new
0 Compound Mortar Mortar
1 lime plaster
2 mortar Screed mortar
3 Gypsum Plaster
4 Gypsum Plaster 2
5 lime Plaster 233
6 Clay 23 Clay
7 Clay plaster Mortar Clay,Mortar
CodePudding user response:
Try this:
dataFrame['Type'] = pd.concat([dataFrame['Name'].str.contains(word, case=False).map({True: word, False: ''}) for word in words], axis=1).agg(list, axis=1).str.join(',').str.strip(',')
Output:
>>> dataFrame
Name Type
0 Compound Mortar Mortar
1 lime plaster
2 mortar Screed Mortar
3 Gypsum Plaster
4 Gypsum Plaster 2
5 lime Plaster 233
6 Clay 23 Clay
7 Clay plaster Mortar Mortar,Clay