I have two datasets. One has survey information and the other one has complaints. I am trying to see if the people who took surveys are mentioned in the description of the complaint. There is a column for first and last name in the survey df. My intention is to use a contains function to check the complaints description column for a match on the first and last name. I would then like to add the survey id to the complaints dataset where there is a match.
Survey data:
survey_id | first_name | last_name |
---|---|---|
survey1 | John | Smith |
Complaint Data:
complaint_number | description |
---|---|
complaint1 | John Wick is a great movie |
complaint2 | Jason Smith stinks |
complaint3 | John Smith is awesome! |
Expected Result of the new complaint df:
complaint_number | Description | matches |
---|---|---|
complaint1 | John Wick is a great movie | |
complaint2 | Jason Smith stinks | |
complaint3 | John Smith is awesome! | survey1 |
CodePudding user response:
This assumes exact match on first and last as full name (see new 'pattern' column), and doesn't look to see if two full names are in the description, but this is a start if you need to modify it.
import io
d1 = '''survey_id first_name last_name
survey1 John Smith
survey2 John Wick
'''
df_survey = pd.read_csv(io.StringIO(d1), sep=' \s ', engine='python')
df_survey
d2 = '''complaint_number description
complaint1 John Wick is a great movie
complaint2 Jason Smith stinks
complaint3 John Smith is awesome!
complaint4 John Wick is awesome!
'''
df_complaints = pd.read_csv(io.StringIO(d2), sep=' \s ', engine='python')
df_survey['pattern'] = df_survey.apply( lambda x: ' '.join([x['first_name'], x['last_name']]), axis=1)
df_complaints['matches'] = ''
def find_matches(x):
sid = x['survey_id']
# print(sid, x['pattern'])
df_complaints['matches'].loc[df_complaints['description'].str.contains(x['pattern'], case=False)] = sid
return None
df_survey.apply(lambda x: find_matches(x), axis=1)
df_complaints
Output
complaint_number description matches
0 complaint1 John Wick is a great movie survey2
1 complaint2 Jason Smith stinks
2 complaint3 John Smith is awesome! survey1
3 complaint4 John Wick is awesome! survey2