df1 = pd.DataFrame({'a':['id1','id2','id3'],'b':['W','W','W'],'c1':[1,2,3]})
df2 = pd.DataFrame({'a':['id1','id2','id3'],'b':['W','W','W'],'c2':[4,5,6]})
df3 = pd.DataFrame({'a':['id1','id4','id5'],'b':['Q','Q','Q'],'c1':[7,8,9]})
I'm trying to concatenate df1
df2
and df3
into one dataframe:
a b c1 c2
id1 W 1 4
id2 W 2 5
id3 W 3 6
id1 Q 7 NA
id4 Q 8 NA
id5 Q 9 NA
I tried:
l = [d.set_index(['a','b']) for d in [df1,df2,df3]]
pd.concat(l, axis=1)
but the output isn't what I expected:
c1 c2 c1
a b
id1 W 1.0 4.0 NaN
id2 W 2.0 5.0 NaN
id3 W 3.0 6.0 NaN
id1 Q NaN NaN 7.0
id4 Q NaN NaN 8.0
id5 Q NaN NaN 9.0
CodePudding user response:
You can join MultiIndex Series
created by DataFrame.stack
:
l = [d.set_index(['a','b']).stack() for d in [df1,df2,df3]]
df = pd.concat(l).unstack().sort_index(level=[1,0], ascending=[False, True])
print (df)
c1 c2
a b
id1 W 1.0 4.0
id2 W 2.0 5.0
id3 W 3.0 6.0
id1 Q 7.0 NaN
id4 Q 8.0 NaN
id5 Q 9.0 NaN
If there is only 3 columns DataFrames use DataFrame.squeeze
or select first column by iloc[:, 0]
for list of Series:
l = [d.set_index(['a','b']).squeeze() for d in [df1,df2,df3]]
keys = [x.name for x in l]
df = (pd.concat(l, axis=0, keys=keys)
.unstack(0)
.sort_index(level=[1,0], ascending=[False, True]))
print (df)
c1 c2
a b
id1 W 1.0 4.0
id2 W 2.0 5.0
id3 W 3.0 6.0
id1 Q 7.0 NaN
id4 Q 8.0 NaN
id5 Q 9.0 NaN
l = [d.set_index(['a','b']).iloc[:, 0] for d in [df1,df2,df3]]
keys = [x.name for x in l]
df = (pd.concat(l, axis=0, keys=keys)
.unstack(0)
.sort_index(level=[1,0], ascending=[False, True]))
Another idea is chain multiple DataFrames in list by DataFrame.combine_first
:
from functools import reduce
dfs = [d.set_index(['a','b']) for d in [df1,df2,df3]]
df = (reduce(lambda x, y: x.combine_first(y), dfs)
.sort_index(level=[1,0], ascending=[False, True]))
print (df)
c1 c2
a b
id1 W 1.0 4.0
id2 W 2.0 5.0
id3 W 3.0 6.0
id1 Q 7.0 NaN
id4 Q 8.0 NaN
id5 Q 9.0 NaN
CodePudding user response:
First merge df1 and df2 by basing columns a and b;
df_try_1 = df1.merge(df2, on=["a","b"])
Then merge that with df3;
df_try_2 = pd.concat([df_try_1, df3], axis=0)
Result; enter image description here
CodePudding user response:
try
a=df1.merge(df2[['a','c2']],on='a',how='left')
l=a.append(df3)
CodePudding user response:
This should work in your case:
df = pd.merge(pd.merge(df1, df2, how='outer', on=['a', 'b']), df3, how='outer', on=['a', 'b'])
df.set_index(['a', 'b'], inplace=True)
df.columns = ['c1', 'c2', 'c3']
print(df)
Result:
c1 c2 c3
a b
id1 W 1.0 4.0 NaN
id2 W 2.0 5.0 NaN
id3 W 3.0 6.0 NaN
id1 Q NaN NaN 7.0
id4 Q NaN NaN 8.0
id5 Q NaN NaN 9.0
CodePudding user response:
I think merge is your best shot for that.
df = df1.combine_first(df2)
pd.merge(df, df3, on=['a', 'b', 'c1'], how='outer')
This results in expected output:
a b c1 c2
0 id1 W 1 4.0
1 id2 W 2 5.0
2 id3 W 3 6.0
3 id1 Q 7 NaN
4 id4 Q 8 NaN
5 id5 Q 9 NaN