Home > front end >  Combine multiple dataframes which have different column names into a new dataframe while adding new
Combine multiple dataframes which have different column names into a new dataframe while adding new

Time:05-13

There are multiple Pandas dataframes with one column each and having different column names.

df1 = pd.DataFrame({'ID1':['a1','a2']})
df1:
    ID1
0   a1
1   a2    

df2 = pd.DataFrame({'ID2':['a1','b1']})
df2:
    ID2
0   a1
1   b1

df3 = pd.DataFrame({'ID3':['a2','b1','b2']})
df3:
    ID3
0   a2
1   b1
2   b2

I want to combine these dataframes into one dataframe as below.

    ID1   ID2   ID3
0   a1    a1    NaN
1   a2    NaN   a2
2   NaN   b1    b1
3   NaN   NaN   b2

pd.merge() can be used if there is only two dataframes. But I want to do this for many dataframes. And also I want a separate column for each dataframe with it's column name. Is there a way of doing this? Thank you!

CodePudding user response:

You can try set the ID column as index and concat them on columns

df = pd.concat([df.set_index(f'ID{i 1}').assign(**{f'ID{i 1}': 1}) for i, df in enumerate([df1, df2, df3])], axis=1)
df = df.apply(lambda col: col.mask(col.eq(1), df.index)).reset_index(drop=True)
print(df)

   ID1  ID2  ID3
0   a1   a1  NaN
1   a2  NaN   a2
2  NaN   b1   b1
3  NaN  NaN   b2
  • Related