Home > Back-end >  find values dataframe pandas
find values dataframe pandas

Time:09-11

I have two dataframes: df1:

id  name   city   age  found_in_df2
2   john   ny     20
4   maria  lima   9
9   susan  cana   17

df2:

comment           question            response
i'm john          are you there       yes, i am
hello!            jajajajaja          go ahead!
please, go on     hello susan         no, i don't
maria             i'm 9 years         sure

I want to find values from column called 'name' from df1 that exist in any column in df2 (this value could be in any column and even inside a phrase). If the value is found, so I want to write 'yes' in column 'found_in_df2' of df1. Someone could help me with that?

CodePudding user response:

One approach could be as follows.

df1['found_in_df2'] = df1.name.apply(lambda x: any(df2.stack().str.contains(x)))\
    .map({True:'yes',False:'no'})

print(df1)

   id   name  city  age found_in_df2
0   2   john    ny   20          yes
1   4  maria  lima    9          yes
2   9  susan  cana   17          yes

CodePudding user response:

The below seems to solve the problem

import pandas as pd

def find_value(data, phrase):
    for col in data.columns:
        if any(data[col].str.contains(phrase)):
            return 'yes'
    return 'no'


df1 = pd.DataFrame([[2, 'john', 'ny', 20], 
                    [4, 'maria', 'lima', 9], 
                    [9, 'susan', 'cana', 17]],
                   columns=['id', 'name', 'city', 'age'])

df2 = pd.DataFrame([["i'm john", "are you there", "yes, i am"], 
                    ["hello!", "jajajajaja", "go ahead!"], 
                    ["please, go on", "hello susan", "no, i don't"],
                    ["maria", "i'm 9 years", "sure"]],
                   columns=['comment', 'question', 'response'])

df1['found_in_df2'] = df1.name.apply(lambda x: find_value(df2, x))

CodePudding user response:

Use:

df1['match'] =\
     df1['name'].str.contains('|'.join(df2.stack().tolist()).map({True:'yes', False:'no'})
  • Related