I have a dataframe df_1
with the following structure:
df_1:
fruit_group_a fruit_group_b
0 apple banana
1 orange pineapple
Given a string value such as apple
or orange
in df_2
:
df_2:
fruit
0 apple
1 pineapple
2 orange
3 banana
I want to use the information in dataframe df_1
to assign the corresponding fruit group name in a separate column to get the desired output df_3
as follows:
df_3:
fruit fruitgroup
0 apple fruit_group_a
1 pineapple fruit_group_b
2 orange fruit_group_a
3 banana fruit_group_b
How do I accomplish this? (I should mention that the dataframes in the real use case has 50 columns and several hundred rows)
Any pointers would be much appreciated!
CodePudding user response:
Use DataFrame.melt
, then DataFrame.merge
with left join - if no match get missing values:
df = df_2.merge(df_1.melt(var_name='fruitgroup', value_name='fruit'), how='left')
print (df)
fruit fruitgroup
0 apple fruit_group_a
1 pineapple fruit_group_b
2 orange fruit_group_a
3 banana fruit_group_b
print (df_2)
fruit
0 apple
1 pineapple
2 orange
3 pear
df = df_2.merge(df_1.melt(var_name='fruitgroup', value_name='fruit'), how='left')
print (df)
fruit fruitgroup
0 apple fruit_group_a
1 pineapple fruit_group_b
2 orange fruit_group_a
3 pear NaN
Without left join:
df_3 = df_2.merge(df_1.melt(var_name='fruitgroup', value_name='fruit'))
print (df_3)
fruit fruitgroup
0 apple fruit_group_a
1 pineapple fruit_group_b
2 orange fruit_group_a
CodePudding user response:
df_3 = df_2.merge(df_1.melt(var_name='fruitgroup', value_name='fruit'))
output:
fruit fruitgroup
0 apple fruit_group_a
1 pineapple fruit_group_b
2 orange fruit_group_a
3 banana fruit_group_b
intermediate output of melt
:
df_1.melt(var_name='fruit_group', value_name='fruit')
fruit_group fruit
0 fruit_group_a apple
1 fruit_group_a orange
2 fruit_group_b banana
3 fruit_group_b pineapple