Home > Enterprise >  How to merge dataframes together with matching columns side by side?
How to merge dataframes together with matching columns side by side?

Time:12-22

I have two dataframes with matching keys. I would like to merge them together based on their keys and have the corresponding columns line up side by side. I am not sure how to achieve this as the pd.merge displays all columns for the first dataframe and then all columns for the second data frame:

df1 = pd.DataFrame(data={'key': ['a', 'b'], 'col1': [1, 2], 'col2': [3, 4]})
df2 = pd.DataFrame(data={'key': ['a', 'b'], 'col1': [5, 6], 'col2': [7, 8]})

print(pd.merge(df1, df2, on=['key']))

  key  col1_x  col2_x  col1_y  col2_y
0   a       1       3       5       7
1   b       2       4       6       8

I am looking for a way to do the same merge and have the columns displays side by side as such:

 key  col1_x  col1_y  col2_x  col2_y
0   a       1       5       3       7
1   b       2       6       4       8

Any help achieving this would be greatly appreciated!

CodePudding user response:

You can use the df1.join() method to achieve this. This method will merge the two dataframes based on their indices, which should match if you set the index to be the 'key' column in both dataframes. Here's an example of how you can do this:

# Set the index of both dataframes to be the 'key' column
df1.set_index('key', inplace=True)
df2.set_index('key', inplace=True)

# Use the join method to merge the dataframes based on their indices
result = df1.join(df2, lsuffix='_x', rsuffix='_y')

# Reset the index if you want the resulting dataframe to have a normal (integer- 
based) index
result.reset_index(inplace=True)

print(result)

CodePudding user response:

One way to reorder the columns of the merged dataframe is to use the DataFrame.reindex method. You can specify the order of the columns using a list of column names.

You can also introduce a suffixes parameter to specify the suffixes to use for overlapping column names. By setting suffixes=('_x', '_y') to specify that the suffix _x should be used for columns from the first dataframe and _y should be used for columns from the second dataframe.

Example:

df1 = pd.DataFrame(data={'key': ['a', 'b'], 'col1': [1, 2], 'col2': [3, 4]})
df2 = pd.DataFrame(data={'key': ['a', 'b'], 'col1': [5, 6], 'col2': [7, 8]})

result = pd.merge(df1, df2, on=['key'], suffixes=('_x', '_y'))
result = result.reindex(columns=['key', 'col1_x', 'col1_y', 'col2_x', 'col2_y'])
print(result)

Output:

  key  col1_x  col1_y  col2_x  col2_y
0   a       1       5       3       7
1   b       2       6       4       8
  • Related