I have the following dataframes
df1:
Name
0 AAa
1 BBB
2 ccc
df2:
Description
0 text AAa clinic text
1 text bbb hospital text
I want to add another column to df2 that extracts Name from Description, but the Name has to be followed by either 'clinic' or 'hospital'. So if Description is "text AAa text", I don't want AAa to be extracted
I feel like this should be straightforward but for some reason I am stuck and can't find a solution
I have tried the following but it returns df2['Extracted Name'] all None
def df_matcher(x):
for i in df1['Name']:
if ((i.lower() " clinic" in x.lower()) or (i.lower() " hospital" in x.lower())):
return i
df2['Extracted Name'] = df2['Description'].apply(df_matcher)
Thanks!
CodePudding user response:
Let's use str.extract
with a regex pattern:
pat = r'(?i)\b(%s)\s(?:clinic|hospital)' % '|'.join(df1.Name)
df2['col'] = df2['Description'].str.extract(pat)
Description col
0 text AAa clinic text AAa
1 text bbb hospital text bbb
CodePudding user response:
If the part you want to remove appears exactly before clinic or hospital, you don't need the first dataframe
and you can try this:
df2['split'] = df2.Description.str.split()
df2['index'] = df2['split'].apply(lambda x:x.index('clinic') if 'clinic' in x else x.index('hospital'))
df2['remove'] = df2.apply(lambda x:x['spilit'][x['index'] - 1], axis=1)
df2['Extracted Name'] = df2.apply(lambda x:x['Description'].replace(x['remove'], ''), axis=1)