This thread here suggested to use reduce
to merge multiple data frames at once.
df1= pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': np.random.randn(4)})
df2= pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value': np.random.randn(4)})
df3= pd.DataFrame({'key': ['A', 'C', 'E', 'F'], 'value': np.random.randn(4)})
df4= pd.DataFrame({'key': ['A', 'B', 'C', 'F'], 'value': np.random.randn(4)})
df_list = [df1, df2, df3, df4]
from functools import reduce
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['key'], how='outer'), df_list )
The merge
is executed, but there is a warning
<stdin>:1: FutureWarning: Passing 'suffixes' which cause duplicate columns {'value_x'} in the result is deprecated and will raise a MergeError in a future version.
and df_meged
has columns with exactly duplicated names.
How do I force distinct names for each column?
CodePudding user response:
You can do
s = pd.concat([x.set_index('key') for x in df_list],axis = 1,keys=range(len(df_list)))
s.columns = s.columns.map('{0[1]}_{0[0]}'.format)
s = s.reset_index()
s
Out[236]:
key value_0 value_1 value_2 value_3
0 A -1.957968 NaN -0.852135 -0.976960
1 B 1.545932 -0.276838 NaN 0.197615
2 C -2.149727 NaN -0.364382 0.349993
3 D 0.524990 -0.476655 NaN NaN
4 E NaN -2.135870 0.798782 NaN
5 F NaN 1.456544 -0.255705 0.447279
CodePudding user response:
An outer join in this scenario is tantamount to appending on the zero axis. You therefore are better of using the two columns in the dfs as the join keys. code below
from functools import reduce
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['key', 'value'], how='outer'), df_list )