I have two data frames as shown below:
A | B | C | D |
---|---|---|---|
Red | 36 | 1 | type-1 |
Blue | 78 | 2 | type-1 |
Green | 59 | 3 | type-1 |
A | B | C | D |
---|---|---|---|
Orange | 78 | 5 | type-2 |
Purple | 59 | 7 | type-2 |
Brown | 36 | 9 | type-2 |
I want to merge the above two data frames on the basis of column B and after merge I want to keep the same columns as shown below:
A | B | C | D | A | B | C | D |
---|---|---|---|---|---|---|---|
Red | 36 | 1 | type-1 | Brown | 36 | 9 | type-2 |
Blue | 78 | 2 | type-1 | Orange | 78 | 5 | type-2 |
Green | 59 | 3 | type-1 | Purple | 59 | 7 | type-2 |
Is it possible to do this using pandas or any other python function?
I have tried using pd.merge function but I needed to change the column names. There exists another function called pd.concat but can I provide the column name (column 'B') in it for merging?
Thanks a lot in advance!
CodePudding user response:
apply rename
to jezrael's anwer and you will get desired output
out = (df1.merge(df2, left_on=df1.B, right_on=df2.B).drop('key_0', axis=1)
.rename(columns=lambda x: x.split('_')[0]))
out
A B C D A B C D
0 Red 36 1 type-1 Brown 36 9 type-2
1 Blue 78 2 type-1 Orange 78 5 type-2
2 Green 59 3 type-1 Purple 59 7 type-2
CodePudding user response:
You can pass to parameters left_on
and right_on
columns from both DataFrames, so is created helper column key_0
, which is removed after join by DataFrame.merge
:
Notice: Pandas has problem with duplicated columns names, it is reason why merge
rename them by suffix _x
and _y
df = df1.merge(df2, left_on=df1.B, right_on=df2.B).drop('key_0', axis=1)
print (df)
A_x B_x C_x D_x A_y B_y C_y D_y
0 Red 36 1 type-1 Brown 36 9 type-2
1 Blue 78 2 type-1 Orange 78 5 type-2
2 Green 59 3 type-1 Purple 59 7 type-2
What is problem with same columns names:
If need select column first A
expected ouput is get Series.
print (df.A_x)
0 Red
1 Blue
2 Green
Name: A_x, dtype: object
But if duplicated names get all columns in DataFrame, DONT DO IT:
df = df.rename(columns=lambda x: x.split('_')[0])
# print (df)
print (df.A)
A A
0 Red Brown
1 Blue Orange
2 Green Purple