Home > Enterprise >  Concatenate pandas DataFrames on columns, similar to outer merge
Concatenate pandas DataFrames on columns, similar to outer merge

Time:04-14

I have 3 dataframes with dates on the first column of each. I would like to concat these dataframes but concating related with the row value of each. If the values match, add on the same row, otherwise, I would expect to have a NaN.

import numpy as np
import pandas as pd

# Create the pandas DataFrame
df1 = pd.DataFrame(['2018-12-31','2019-09-30','2022-01-31'], columns = ['Date1'])
df2 = pd.DataFrame(['2019-09-30','2022-02-28'], columns = ['Date2'])
df3 = pd.DataFrame(['2019-09-30','2021-06-30','2021-11-30','2022-03-31'], columns = ['Date3'])

display(df1)
display(df2)
display(df3)

data = {'Date1': ['2018-12-31','2019-09-30',np.nan,np.nan,'2022-01-31',np.nan,np.nan],
        'Date2': [np.nan,'2019-09-30',np.nan,np.nan,np.nan,'2022-02-28',np.nan],
        'Date3': [np.nan,'2019-09-30','2021-06-30','2021-11-30',np.nan,np.nan,'2022-01-31']}

desired_df = pd.DataFrame(data)
desired_df

This is what I am trying to achieve.

Date1 Date2 Date3
0 2018-12-31 NaN NaN
1 2019-09-30 2019-09-30 2019-09-30
2 NaN NaN 2021-06-30
3 NaN NaN 2021-11-30
4 2022-01-31 NaN NaN
5 NaN 2022-02-28 NaN
6 NaN NaN 2022-01-31

My original idea was to used something like:

pd.concat([df1,df2,df3], axis=1, join="outer")

However, above will produce something like:

Date1 Date2 Date3
2018-12-31 2019-09-30 2019-09-30
2019-09-30 2022-02-28 2021-06-30
2022-01-31 NaN 2021-11-30
NaN NaN 2022-03-31

CodePudding user response:

We could set_index with the Dates (by setting the drop parameter to False, we don't lose the column), then concat horizontally:

out = (pd.concat([df.set_index(f'Date{i 1}', drop=False) 
                 for i, df in enumerate([df1, df2, df3])], axis=1)
       .sort_index().reset_index(drop=True))

Output:

        Date1       Date2       Date3
0  2018-12-31         NaN         NaN
1  2019-09-30  2019-09-30  2019-09-30
2         NaN         NaN  2021-06-30
3         NaN         NaN  2021-11-30
4  2022-01-31         NaN         NaN
5         NaN  2022-02-28         NaN
6         NaN         NaN  2022-03-31
  • Related