Home > Net >  Add to a Pandas DafaFrame, information from another DataFrame & in the right order
Add to a Pandas DafaFrame, information from another DataFrame & in the right order

Time:03-08

I have a Pandas dataframe like this :

# FIRST DF
ROW_A    ROW_B
a        52
a        52
b        45
b        45
b        45
c        69

In a second dataframe :

# SECOND DF
ROW_A    ROW_B    ROW_C    ROW_D
c        69       x        y
a        52       toto     tata
a        52       titi     tutu
b        45       hey      hi
b        45       hola     yo
b        45       slt      bjr

I want to put the second dataframe information on the 1st dataframe in the correct order, just like this :

# FIRST DF
ROW_A    ROW_B    ROW_C    ROW_D
a        52       toto     tata
a        52       titi     tutu
b        45       hey      hi
b        45       hola     yo
b        45       slt      bjr
c        69       x        y

I tried this algorithm:

for i in range(len(first_df)):
    row_a = first_df.iloc[i, 0]
    for y in range(len(second_df)):
        rw_a = second_df.iloc[y, 0]
        row_c = second_df.iloc[y, 2]
        row_d = second_df.iloc[y, 3]
        if row_a == rw_a:
            first_df[i, "ROW_C"] = row_c
            first_df[i, "ROW_D"] = row_d

This algo always put the last row information :

# FIRST DF
ROW_A    ROW_B    ROW_C    ROW_D
a        52       titi     tutu
a        52       titi     tutu
b        45       slt      bjr
b        45       slt      bjr
b        45       slt      bjr
c        69       x        y

The objective is not to add columns but to add information according to each line. I don't know how to deal with it.

CodePudding user response:

Use df.merge with df.drop_duplicates:

In [2113]: output = df1.merge(df2).drop_duplicates()

In [2114]: output
Out[2114]: 
   ROW_A  ROW_B ROW_C ROW_D
0      a     52  toto  tata
1      a     52  titi  tutu
4      b     45   hey    hi
5      b     45  hola    yo
6      b     45   slt   bjr
13     c     69     x     y

CodePudding user response:

An alternative, less sexy that that of @Mayank but more efficient if many duplicates:

cols = ['ROW_A', 'ROW_B']
(df1
 .assign(x=df1.groupby(cols).cumcount())
 .merge(df2.assign(x=df2.groupby(cols).cumcount()),
        on=cols ['x']
       )
 .drop('x', axis=1)
)

output:

  ROW_A  ROW_B ROW_C ROW_D
0     a     52  toto  tata
1     a     52  titi  tutu
2     b     45   hey    hi
3     b     45  hola    yo
4     b     45   slt   bjr
5     c     69     x     y
  • Related