Home > Software engineering >  Is it possible to search for two dataframe columns's values in another dataframe's column
Is it possible to search for two dataframe columns's values in another dataframe's column

Time:10-01

I have two datasets. One has survey information and the other one has complaints. I am trying to see if the people who took surveys are mentioned in the description of the complaint. There is a column for first and last name in the survey df. My intention is to use a contains function to check the complaints description column for a match on the first and last name. I would then like to add the survey id to the complaints dataset where there is a match.

Survey data:

survey_id first_name last_name
survey1 John Smith

Complaint Data:

complaint_number description
complaint1 John Wick is a great movie
complaint2 Jason Smith stinks
complaint3 John Smith is awesome!

Expected Result of the new complaint df:

complaint_number Description matches
complaint1 John Wick is a great movie
complaint2 Jason Smith stinks
complaint3 John Smith is awesome! survey1

CodePudding user response:

This assumes exact match on first and last as full name (see new 'pattern' column), and doesn't look to see if two full names are in the description, but this is a start if you need to modify it.

import io

d1 = '''survey_id   first_name  last_name
survey1 John    Smith
survey2 John    Wick
'''
df_survey = pd.read_csv(io.StringIO(d1), sep=' \s ', engine='python')
df_survey

d2 = '''complaint_number    description
complaint1  John Wick is a great movie
complaint2  Jason Smith stinks
complaint3  John Smith is awesome!
complaint4  John Wick is awesome!
'''
df_complaints = pd.read_csv(io.StringIO(d2), sep=' \s ', engine='python')

df_survey['pattern'] =  df_survey.apply( lambda x: ' '.join([x['first_name'], x['last_name']]), axis=1)
df_complaints['matches'] = ''

def find_matches(x):
    sid = x['survey_id']
    # print(sid, x['pattern'])
    df_complaints['matches'].loc[df_complaints['description'].str.contains(x['pattern'], case=False)] = sid
    return None

df_survey.apply(lambda x: find_matches(x), axis=1)
df_complaints

Output

  complaint_number                 description  matches
0       complaint1  John Wick is a great movie  survey2
1       complaint2          Jason Smith stinks
2       complaint3      John Smith is awesome!  survey1
3       complaint4       John Wick is awesome!  survey2
  • Related