I have two dataframe of similar coloumn and want ot merge them.
dataset for example,
df1 = pd.DataFrame({'A': [1, 2], 'B': ["abc", "abc"], 'C': [0, 1]})
df2 = pd.DataFrame({'A': [1, 6], 'B': ["bcd", "abc"], 'C': [5, 5]})
I used the code..
pd.concat([df1, df2], axis=1)
which returns
A B C A B C
0 1 abc 0 1 bcd 5
1 2 abc 1 6 abc 5
Expected format
A A B B C C
0 1 1 abc bcd 0 5
1 2 6 abc abc 1 5
Is there any way to this? Thanks
CodePudding user response:
you can sort the columns (with a stable sort so that df1's columns come first):
>>> pd.concat([df1, df2], axis=1).sort_index(axis="columns", kind="stable")
A A B B C C
0 1 1 abc bcd 0 5
1 2 6 abc abc 1 5
CodePudding user response:
You can sort_index
after the concat
:
pd.concat([df1, df2], axis=1).sort_index(axis=1, kind='stable')
Or, if you don't want to sort, concat vertically adding an extra level and unstack
:
pd.concat(dict(enumerate([df1, df2]))).unstack(-1).droplevel(axis=1, level=1)
Output:
A A B B C C
0 1 1 abc bcd 0 5
1 2 6 abc abc 1 5
CodePudding user response:
You can use the pd.concat()
function with the keys parameter to specify the column names for each dataframe and the axis=1
parameter to concatenate the dataframes side by side.
result = pd.concat([df1, df2], axis=1, keys=['df1', 'df2'])
result = result.reindex(sorted(result.columns), axis=1)
This will give the expected format with columns labeled as 'df1_A', 'df2_A', 'df1_B', 'df2_B', 'df1_C', 'df2_C'.
Alternatively, you can use merge function with left join
result = df1.merge(df2, how='left', left_on='A', right_on='A', suffixes=('_df1', '_df2'))
This will give you the expected format with columns labeled as 'A', 'B_df1', 'C_df1', 'B_df2', 'C_df2'