There are multiple Pandas dataframes with one column each and having different column names.
df1 = pd.DataFrame({'ID1':['a1','a2']}) df1: ID1 0 a1 1 a2 df2 = pd.DataFrame({'ID2':['a1','b1']}) df2: ID2 0 a1 1 b1 df3 = pd.DataFrame({'ID3':['a2','b1','b2']}) df3: ID3 0 a2 1 b1 2 b2
I want to combine these dataframes into one dataframe as below.
ID1 ID2 ID3 0 a1 a1 NaN 1 a2 NaN a2 2 NaN b1 b1 3 NaN NaN b2
pd.merge() can be used if there is only two dataframes. But I want to do this for many dataframes. And also I want a separate column for each dataframe with it's column name. Is there a way of doing this? Thank you!
CodePudding user response:
You can try set the ID
column as index and concat them on columns
df = pd.concat([df.set_index(f'ID{i 1}').assign(**{f'ID{i 1}': 1}) for i, df in enumerate([df1, df2, df3])], axis=1)
df = df.apply(lambda col: col.mask(col.eq(1), df.index)).reset_index(drop=True)
print(df)
ID1 ID2 ID3
0 a1 a1 NaN
1 a2 NaN a2
2 NaN b1 b1
3 NaN NaN b2