Home > Enterprise >  Pandas: How do you get column labels from one df based on row values and assign these as row values
Pandas: How do you get column labels from one df based on row values and assign these as row values

Time:04-20

I have a dataframe df_1 with the following structure:

df_1:

    fruit_group_a   fruit_group_b
0   apple           banana
1   orange          pineapple

Given a string value such as apple or orange in df_2:

df_2:

    fruit
0   apple
1   pineapple
2   orange
3   banana

I want to use the information in dataframe df_1 to assign the corresponding fruit group name in a separate column to get the desired output df_3 as follows:

df_3:

    fruit           fruitgroup
0   apple           fruit_group_a
1   pineapple       fruit_group_b
2   orange          fruit_group_a
3   banana          fruit_group_b

How do I accomplish this? (I should mention that the dataframes in the real use case has 50 columns and several hundred rows)

Any pointers would be much appreciated!

CodePudding user response:

Use DataFrame.melt, then DataFrame.merge with left join - if no match get missing values:

df = df_2.merge(df_1.melt(var_name='fruitgroup', value_name='fruit'), how='left')
print (df)
       fruit     fruitgroup
0      apple  fruit_group_a
1  pineapple  fruit_group_b
2     orange  fruit_group_a
3     banana  fruit_group_b

print (df_2)
       fruit
0      apple
1  pineapple
2     orange
3       pear

df = df_2.merge(df_1.melt(var_name='fruitgroup', value_name='fruit'), how='left')
print (df)
       fruit     fruitgroup
0      apple  fruit_group_a
1  pineapple  fruit_group_b
2     orange  fruit_group_a
3       pear            NaN

Without left join:

df_3 = df_2.merge(df_1.melt(var_name='fruitgroup', value_name='fruit'))
print (df_3)
       fruit     fruitgroup
0      apple  fruit_group_a
1  pineapple  fruit_group_b
2     orange  fruit_group_a

CodePudding user response:

You can melt and merge:

df_3 = df_2.merge(df_1.melt(var_name='fruitgroup', value_name='fruit'))

output:

       fruit     fruitgroup
0      apple  fruit_group_a
1  pineapple  fruit_group_b
2     orange  fruit_group_a
3     banana  fruit_group_b

intermediate output of melt:

df_1.melt(var_name='fruit_group', value_name='fruit')

     fruit_group      fruit
0  fruit_group_a      apple
1  fruit_group_a     orange
2  fruit_group_b     banana
3  fruit_group_b  pineapple
  • Related