Home > Blockchain >  Pandas 2 data frames
Pandas 2 data frames

Time:03-28

I have 2 data frames as below:

Data Frame 1:

Col_1   Col_2   Col_3   Col_4   Col_5   Col_6   Col_7
SAN      1        X       2      1       a      100
BAN      1       YN       5     6        b      NULL
QAN      1       JH        5    6        c      NULL
LAN      1        QK       5    6        d      200
MKL      1        LM       5    6        e      300

Data Frame 2:

Col_1   Col_2
X   300.1
Y   400.33
Z   50.66
XM  12.88
YN  111.09
ZK  300.88

As you can see in the Data Frame 1 I have some NULL values.

Now if any element of the column 7 of data frame 1 is NULL AND if an eleemnt in the the column 3 of data frame 1 is the same as any element in column 1 data frame 2 then I want to replace the element of column 7 of data frame 1 (the NULL element) with the element of column 2 of data frame 2. That is, I want to edit the data frame 1 as below (in the above case):

Data Frame 1:

Col_1   Col_2   Col_3   Col_4   Col_5   Col_6   Col_7
SAN      1        X       2      1       a      100
BAN      1       YN       5     6        b      **111.09**
QAN      1       JH        5    6        c      NULL
LAN      1        QK       5    6        d      200
MKL      1        LM       5    6        e      300

What is the ebst way to do this through pandas?

CodePudding user response:

You can merge to craft a Series, then use it to combine_first on Col_7:

s = df1[['Col_3']].merge(df2[['Col_1', 'Col_2']], left_on='Col_3', right_on='Col_1')['Col_2']

df1['Col_7'] = df1['Col_7'].combine_first(s)

output:

  Col_1  Col_2 Col_3  Col_4  Col_5 Col_6   Col_7
0   SAN      1     X      2      1     a  100.00
1   BAN      1    YN      5      6     b  111.09
2   QAN      1    JH      5      6     c     NaN
3   LAN      1    QK      5      6     d  200.00
4   MKL      1    LM      5      6     e  300.00

CodePudding user response:

You can use df2 as dict mapping:

df1['Col_7'] = df1['Col_7'].fillna(df1['Col_3'].map(df2.set_index('Col_1')['Col_2']))
print(df1)

# Output
  Col_1  Col_2 Col_3  Col_4  Col_5 Col_6   Col_7
0   SAN      1     X      2      1     a  100.00
1   BAN      1    YN      5      6     b  111.09
2   QAN      1    JH      5      6     c     NaN
3   LAN      1    QK      5      6     d  200.00
4   MKL      1    LM      5      6     e  300.00
  • Related