I have a Pandas dataframe like this :
# FIRST DF
ROW_A ROW_B
a 52
a 52
b 45
b 45
b 45
c 69
In a second dataframe :
# SECOND DF
ROW_A ROW_B ROW_C ROW_D
c 69 x y
a 52 toto tata
a 52 titi tutu
b 45 hey hi
b 45 hola yo
b 45 slt bjr
I want to put the second dataframe information on the 1st dataframe in the correct order, just like this :
# FIRST DF
ROW_A ROW_B ROW_C ROW_D
a 52 toto tata
a 52 titi tutu
b 45 hey hi
b 45 hola yo
b 45 slt bjr
c 69 x y
I tried this algorithm:
for i in range(len(first_df)):
row_a = first_df.iloc[i, 0]
for y in range(len(second_df)):
rw_a = second_df.iloc[y, 0]
row_c = second_df.iloc[y, 2]
row_d = second_df.iloc[y, 3]
if row_a == rw_a:
first_df[i, "ROW_C"] = row_c
first_df[i, "ROW_D"] = row_d
This algo always put the last row information :
# FIRST DF
ROW_A ROW_B ROW_C ROW_D
a 52 titi tutu
a 52 titi tutu
b 45 slt bjr
b 45 slt bjr
b 45 slt bjr
c 69 x y
The objective is not to add columns but to add information according to each line. I don't know how to deal with it.
CodePudding user response:
Use df.merge
with df.drop_duplicates
:
In [2113]: output = df1.merge(df2).drop_duplicates()
In [2114]: output
Out[2114]:
ROW_A ROW_B ROW_C ROW_D
0 a 52 toto tata
1 a 52 titi tutu
4 b 45 hey hi
5 b 45 hola yo
6 b 45 slt bjr
13 c 69 x y
CodePudding user response:
An alternative, less sexy that that of @Mayank but more efficient if many duplicates:
cols = ['ROW_A', 'ROW_B']
(df1
.assign(x=df1.groupby(cols).cumcount())
.merge(df2.assign(x=df2.groupby(cols).cumcount()),
on=cols ['x']
)
.drop('x', axis=1)
)
output:
ROW_A ROW_B ROW_C ROW_D
0 a 52 toto tata
1 a 52 titi tutu
2 b 45 hey hi
3 b 45 hola yo
4 b 45 slt bjr
5 c 69 x y