I have two dataframes with matching keys. I would like to merge them together based on their keys and have the corresponding columns line up side by side. I am not sure how to achieve this as the pd.merge
displays all columns for the first dataframe and then all columns for the second data frame:
df1 = pd.DataFrame(data={'key': ['a', 'b'], 'col1': [1, 2], 'col2': [3, 4]})
df2 = pd.DataFrame(data={'key': ['a', 'b'], 'col1': [5, 6], 'col2': [7, 8]})
print(pd.merge(df1, df2, on=['key']))
key col1_x col2_x col1_y col2_y
0 a 1 3 5 7
1 b 2 4 6 8
I am looking for a way to do the same merge and have the columns displays side by side as such:
key col1_x col1_y col2_x col2_y
0 a 1 5 3 7
1 b 2 6 4 8
Any help achieving this would be greatly appreciated!
CodePudding user response:
You can use the df1.join() method to achieve this. This method will merge the two dataframes based on their indices, which should match if you set the index to be the 'key' column in both dataframes. Here's an example of how you can do this:
# Set the index of both dataframes to be the 'key' column
df1.set_index('key', inplace=True)
df2.set_index('key', inplace=True)
# Use the join method to merge the dataframes based on their indices
result = df1.join(df2, lsuffix='_x', rsuffix='_y')
# Reset the index if you want the resulting dataframe to have a normal (integer-
based) index
result.reset_index(inplace=True)
print(result)
CodePudding user response:
One way to reorder the columns of the merged dataframe is to use the DataFrame.reindex
method. You can specify the order of the columns using a list of column names.
You can also introduce a suffixes
parameter to specify the suffixes to use for overlapping column names. By setting suffixes=('_x', '_y')
to specify that the suffix _x
should be used for columns from the first dataframe and _y
should be used for columns from the second dataframe.
Example:
df1 = pd.DataFrame(data={'key': ['a', 'b'], 'col1': [1, 2], 'col2': [3, 4]})
df2 = pd.DataFrame(data={'key': ['a', 'b'], 'col1': [5, 6], 'col2': [7, 8]})
result = pd.merge(df1, df2, on=['key'], suffixes=('_x', '_y'))
result = result.reindex(columns=['key', 'col1_x', 'col1_y', 'col2_x', 'col2_y'])
print(result)
Output:
key col1_x col1_y col2_x col2_y
0 a 1 5 3 7
1 b 2 6 4 8