Home > database >  Pandas map between two dataframes into column
Pandas map between two dataframes into column

Time:06-24

Let's say that I have a df1 like (there are more columns but only this one is relatable):

A
a1
a2
a3

and a df2 like:

A
a1
a3
a4
a7

The case is that df2 contains in column A (column names are the same both in df1 and df2) some of the values in df1, but not all of them. Now, I'd like to add a column "Found in df2?" to a df1, representing if the value was found or not. Example:

df1
A  Found in df2?
a1       Y
a2       N
a3       Y

I've tried np.where and some merging magic but couldn't wrap my head around this.

CodePudding user response:

You can use isin:

df['found in df2'] = df['A'].isin(df2['A'].values)

print(df)

    A   found in df2
0   a1  True
1   a2  False
2   a3  True

Setup

df = pd.DataFrame({'A':['a1','a2','a3']})
df2 = pd.DataFrame({'A':['a1','a3','a4','a7']})

CodePudding user response:

If using pandas cannot solve your problem, I suggest you to turn them into list/arrays such that it is easier to work with.

Here is one way to work around:

import pandas as pd
df1 = pd.DataFrame({
    'A': [1,2,3]
})
df2 = pd.DataFrame({
    'A': [1,3,4,7]
})

df1['Found in df2?'] = ['Y' if x in df2['A'].values else 'N' for x in df1['A'].values]
display(df1)

CodePudding user response:

Pandas merge has an indicator feature that tells you exactly where a value is found.

Like this:

dfm = pd.merge(df1, df2[['A']], how="left", indicator=True)

dfm['Found in df2'] = dfm["_merge"] == "both"

You get a column _merge that tells you "left_only" or "both" depending on where the A values are found. Use how="outer" if you want to know which values are only in df2 too.

Now, merge may reorder rows in df1 based on the order of A. See other questions on SO how to counteract that, if needed.

  • Related