I need to join the below dataframe based on some condition.
df1:
Id Value
[101,102,103] 10001
[101,102,104] 10000
[101,102,105] 10002
[101,107,105] 10003
df2:
Id Product_Name
[101,102,103,104] Shoe
[101,102,109,104] jeans
[101,105,102,108] make-up
[101,105,106,118] shirt
df_output
Id Value Product_Name
[101,102,103] 10001 Shoe -- Every value present in df2.Id list
[101,102,104] 10000 Jeans -- Every value present in df2.Id list
[101,102,105] 10002 Make-Up -- Every value present in df2.Id list
[101,107,105] 10003 NaN -- Not Every Value matches in df2.Id list.
I need to join two dataframe df1, df2 based on Id column but every element should be in df.Id list that's when we consider it a match.
Python Code:
import pandas as pd
df_output = pd.merge(df1, df2, on='Id', how='left')
CodePudding user response:
While this isn't a highly efficient solution, you can use some set
s to solve this problem.
matches = df1["Id"].apply(set) <= df2["Id"].apply(set)
out = df1.copy()
out.loc[matches, df2.columns.difference(["Id"])] = df2
print(out)
Id Value Product_Name
0 [101, 102, 103] 10001 Shoe
1 [101, 102, 104] 10000 jeans
2 [101, 102, 105] 10002 make-up
3 [101, 107, 105] 10003 NaN
In the above snippet:
matches = df1["Id"].apply(set) <= df2["Id"].apply(set)
returns a booleanSeries
that is True where the contents of each row in df1['Id'] is in the corresponding row in df2['Id'], and False otherwise- Instead of performing an actual
merge
we can simply align the 2 DataFrames on the aforementioned booleanSeries
CodePudding user response:
IIUC, you could concatenate on axis=1
; then check if df1.Id
is a subset of df2.Id
for each row and use where
to mask "Product_Name" depending on check results.
out = pd.concat([df1, df2], axis=1)
out['Product_Name'] = out['Product_Name'].where(out['Id'].apply(lambda x: set(x.iat[0]).issubset(x.iat[1]), axis=1))
out = out.loc[:, ~out.columns.duplicated()]
Output:
Id Value Product_Name
0 [101, 102, 103] 10001 Shoe
1 [101, 102, 104] 10000 jeans
2 [101, 102, 105] 10002 make-up
3 [101, 107, 105] 10003 NaN