I have multiple data frames with different columns names that I would like to merge, but they have the same column name for one specific column, which is called user
df1
user books groceries
0 alex 2 10
1 andrew 10 15
2 kelvin 15 20
3 mary 5 25
df2
user saving animal
0 mary 2 1
1 alex 3 0
2 andrew 4 0
3 kelvin 5 2
df3
user place home
0 alex 3 1
1 mary 3 5
2 andrew 4 5
3 kelvin 5 2
If there are only 2 data frames, I get my desired output with
pd.merge(df1[['user', 'books']], df2[['user', 'animal']], on='user')
Desired Output
user books animal
0 alex 2 0
1 andrew 10 0
2 kelvin 15 2
3 mary 5 1
If there are 3 or more data frames, I get errors when trying to use
pd.merge(df1[['user', 'books']], df2[['user', 'animal']], df3[['user', 'place']], on='user')
Error Message
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Any suggestion on how I can achieve the desired output with 3 or more dataframes?
CodePudding user response:
You can use functools:
import functools
dfs = [df1[['user', 'books']], df2[['user', 'animal']], df3[['user', 'place']]]
df_final = functools.reduce(lambda left, right: pd.merge(left,right,on='user'), dfs)
print(df_final)
Print out:
user books animal place
0 alex 2 0 3
1 andrew 10 0 4
2 kelvin 15 2 5
3 mary 5 1 3
This has the advantage that you could easily expand on many more data frames if you wanted too.
CodePudding user response:
you can use
df4 = pd.merge(pd.merge(df1[['user', 'books']], df2[['user', 'animal']],on='user'),df3[['user', 'place']],on='user')
CodePudding user response:
Try merging 2 at first then merge the result with the third one. As per my understanding from documentation they are expecting two dataframes to be merged at a time.
pandas.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)
Have you seen https://pandas.pydata.org/docs/reference/api/pandas.merge.html