Home > Software design >  Python: Join two DataFrames with same column prefix on same indices
Python: Join two DataFrames with same column prefix on same indices

Time:12-22

I have two dataframes that look like this:

df1 = pd.DataFrame(
{
    "A_price": [10, 12],
    "B_price": [20, 21],
},
index = ['01-01-2020', '01-02-2021']
)
df1:
            A_price B_price
01-01-2020  10      20
01-02-2021  12      21

df2 = pd.DataFrame(
{
    "A_weight": [0.1, 0.12],
    "B_weight": [0.2, 0.21],
},
index = ['01-01-2020', '01-02-2021']
)
df2:
            A_weight B_weight
01-01-2020  0.1      0.2
01-02-2021  0.12     0.21

How can I join the two dataframes on the same indices and then have the columns in a hierarchy? i.e. I want the following:

df:
            A              B
            price weight   price weight      
01-01-2020  10    0.1      20    0.2
01-02-2021  12    0.12     21    0.21

CodePudding user response:

Use join (or merge) and explode your column names.

# out = pd.merge(df1, df2, left_index=True, right_index=True)
out = out.join(df2)
out.columns = out.columns.str.split('_', expand=True)
out = out.sort_index(axis=1)
print(out)

# Output:
               A            B       
           price weight price weight
01-01-2020    10   0.10    20   0.20
01-02-2021    12   0.12    21   0.21

CodePudding user response:

Simply concat horizontally with pd.concat with axis=1, and split the columns by _ with .columns.str.split (which, with expand=True, returns a MultiIndex):

new_df = pd.concat([df1, df2], axis=1)
new_df.columns = new_df.columns.str.split('_', expand=True)

Output:

>>> new_df
               A     B      A      B
           price price weight weight
01-01-2020    10    20   0.10   0.20
01-02-2021    12    21   0.12   0.21

CodePudding user response:

This should work.

pd.concat((df1.T,df2.T), keys=["A", "B"]).T

CodePudding user response:

You can use pd.concat using the keys parameter with sort_index() to get them in the right structure. Then rename the columns to remove the prefix in the inner level of the multiindex:

df = pd.concat([df1, df2], keys=['A','B'],axis=1).sort_index(level=1, axis=1)
df.rename(columns=lambda x: x.split('_')[1], level=1)

               A      B     A      B
           price weight price weight
01-01-2020    10   0.10    20   0.20
01-02-2021    12   0.12    21   0.21
  • Related