Home > Software engineering >  pd.merge multiple dataframes with same column name on one specific column
pd.merge multiple dataframes with same column name on one specific column

Time:07-09

I have multiple data frames with different columns names that I would like to merge, but they have the same column name for one specific column, which is called user

df1

     user       books     groceries
0    alex        2           10
1    andrew      10          15
2    kelvin      15          20
3    mary        5           25

df2

     user        saving    animal
0    mary           2      1
1    alex           3      0
2    andrew         4      0
3    kelvin         5      2

df3

     user        place    home
0    alex           3      1
1    mary           3      5
2    andrew         4      5
3    kelvin         5      2

If there are only 2 data frames, I get my desired output with

pd.merge(df1[['user', 'books']], df2[['user', 'animal']], on='user')

Desired Output


      user       books     animal
0    alex        2           0
1    andrew      10          0
2    kelvin      15          2
3    mary        5           1

If there are 3 or more data frames, I get errors when trying to use

pd.merge(df1[['user', 'books']], df2[['user', 'animal']], df3[['user', 'place']], on='user')

Error Message

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Any suggestion on how I can achieve the desired output with 3 or more dataframes?

CodePudding user response:

You can use functools:

import functools
dfs = [df1[['user', 'books']], df2[['user', 'animal']], df3[['user', 'place']]]

df_final = functools.reduce(lambda left, right: pd.merge(left,right,on='user'), dfs)
print(df_final)

Print out:

     user  books  animal  place
0    alex      2       0      3
1  andrew     10       0      4
2  kelvin     15       2      5
3    mary      5       1      3

This has the advantage that you could easily expand on many more data frames if you wanted too.

CodePudding user response:

you can use

df4 = pd.merge(pd.merge(df1[['user', 'books']], df2[['user', 'animal']],on='user'),df3[['user', 'place']],on='user')

CodePudding user response:

Try merging 2 at first then merge the result with the third one. As per my understanding from documentation they are expecting two dataframes to be merged at a time.

pandas.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)

Have you seen https://pandas.pydata.org/docs/reference/api/pandas.merge.html

  • Related