I have two dataframes df1 and df2. I would like to check if id2 value exists in df2. If exists, fetch the rows in df1 with identical id1 values. I tried left join but it gives only the first row.
pd.merge(df1, df2.rename(columns={'id1':'id2'}), on='id2', how='left')
df1
id1 id2 id3
C45 AB ZU
C45 ZO RE
C67 RT FG
C78 TZ GH
df2
id1
AB
GH
ZU
Expected output is:
id1 id2 id3
C45 AB ZU
C45 ZO RE
CodePudding user response:
I did it in 2 separate steps.
First, find all the items in df1.id1 which match df2.id1 and save them to lookup
.
Then, find all the rows which are in lookup
within df1.id1 and save them to final
.
First,
# Import pandas library
import pandas as pd
import numpy as np
# initialize list of lists
data = [['C45', 'AB','ZU'], ['C45', 'ZO','RE'], ['C67', 'RT','FG'],['C78','TZ','GH']]
# Create the pandas DataFrame
df1 = pd.DataFrame(data, columns=['id1', 'id2', 'id3'])
# initialize list of lists
data = [['AB'], ['GH'], ['ZU']]
# Create the pandas DataFrame
df2 = pd.DataFrame(data, columns=['id1'])
lookup = df1[df1.id2.isin(df2.id1)].id1 # contains a series of the ids in df1.id2 that match df2.id1
final = df1[df1.id1.isin(lookup)]
final looks like this,
id1 id2 id3
0 C45 AB ZU
1 C45 ZO RE
CodePudding user response:
Here we go
df1 = pd.DataFrame([['C45','AB','ZU'],['C45','ZO','RE'],
['C67','RT','FG',],['C78','TZ','GH']], columns=['id1','id2','id3'])
df2 = pd.DataFrame(['AB','GH','ZU'], columns=['id1'])
Calculate a mask for the required rows:
mask = df1.isin(df2.id1.values).any(1)
mask looks like this
0 True
1 False
2 False
3 True
dtype: bool
the required output:
df1[mask]
id1 id2 id3
0 C45 AB ZU
3 C78 TZ GH