df1 contains products characteristics in different columns. In the second dataframe, the characteristics of the products are located in one column. df2 are joined (left) to df1 using three keys as many times as there are columns of characteristics in the df1. 1.How to add a column to the second dataframe indicating whether the row is involved in the join? 2.How to make join more elegant))
d1 = {"Product": ["product1", "product1", "product2", "product1", "product2", "product1", "product3"], "Store": ["Store1", "Store2", "Store1", "Store1", "Store1", "Store2", "Store2"], "Attribute1": pd.Series(["red", "green", "green", "blue"], index=[0, 2, 5, 6]), "Attribute2": pd.Series(["red", "green", "red"], index=[1, 3, 4])}
df1 = pd.DataFrame(data=d1)
d2 = {"Product": ["product1", "product1", "product2", "product1", "product2", "product1", "product3", "product3", "product2"], "Store": ["Store1", "Store2", "Store1", "Store1", "Store1", "Store2", "Store2", "Store1", "Store2"], "Attribute": ["red", "red", "green", "green", "red", "green", "blue", "blue", "red"], "Package": ["type1", "type2", "type3", "type4", "type5", "type6", "type2", "type4", "type2"]}
df2 = pd.DataFrame(data=d2)
df3 = pd.merge(df1, df2, how = 'left', left_on = ['Product', 'Store', 'Attribute1'], right_on = ['Product', 'Store', 'Attribute'])
df3 = df3.drop(['Attribute'], axis=1)
df3 = pd.merge(df3, df2, how = 'left', left_on = ['Product', 'Store', 'Attribute2'], right_on = ['Product', 'Store', 'Attribute'])
df3 = df3.drop(['Attribute'], axis=1)
df3["Package"] = df3["Package_x"].map(str, na_action=None) df3["Package_y"].map(str, na_action=None)
df3 = df3.drop(['Package_x'], axis=1)
df3 = df3.drop(['Package_y'], axis=1)
df3["Package"] = df3["Package"].str.replace("nan","")
CodePudding user response:
IIUC, use indicator
parameter:
df3 = pd.merge(df1, df2, how = 'left', left_on = ['Product', 'Store', 'Attribute1'], right_on = ['Product', 'Store', 'Attribute'], indicator=True)
df3 = df3.drop(['Attribute'], axis=1)
df3
Output:
Product Store Attribute1 Attribute2 _merge Package
0 product1 Store1 red NaN both type1
1 product1 Store2 NaN red left_only type2
2 product2 Store1 green NaN both type3
3 product1 Store1 NaN green left_only type4
4 product2 Store1 NaN red left_only type5
5 product1 Store2 green NaN both type6
6 product3 Store2 blue NaN both type2
CodePudding user response:
If Attribute1
and Attribute2
columns are mutually exclusive, you can create a new column Attribute
in df1 before merge:
out = (df1.assign(Attribute=df1['Attribute1'].fillna(df1['Attribute2']))
.merge(df2, on=['Product', 'Store', 'Attribute'], how='left')
.drop(columns='Attribute'))
print(out)
# Output
Product Store Attribute1 Attribute2 Package
0 product1 Store1 red NaN type1
1 product1 Store2 NaN red type2
2 product2 Store1 green NaN type3
3 product1 Store1 NaN green type4
4 product2 Store1 NaN red type5
5 product1 Store2 green NaN type6
6 product3 Store2 blue NaN type2
Update: if you have multiple AttributeX
columns with a same prefix like "Attribute", you can replace:
df1['Attribute1'].fillna(df1['Attribute2'])
by:
df1.filter(like='Attribute').ffill(axis=1).iloc[:, -1]