Home > Software engineering >  Pandas If duplicate on same row, lookup value from another dataframe
Pandas If duplicate on same row, lookup value from another dataframe

Time:03-11

I have this scenario in mind but I don't know how to go about it.

When a value in column changeme is equal to the one from the lookforme column on the same row in df1, I would like to lookup that value on the matchme column of df2, grab the corresponding grabme value and overwrite the one in changeme on df1.

In case of multiple occurrences of lookforme found on df2, match just the first one.

I tried a few methods but couldn't get anywhere, had complaints about the DataFrames being uneven when I tried factorize, etc.

          changeme       name   qty  lookforme
0          1296477  ItemName2  3.16    1296477
1            62328  ItemName1  7.94      62328
2           233201  ItemName3  5.56     233201
3      66074000105  ItemName4  6.88     233694

           matchme       grabme
0            62328  35422028768
1           162592  62673001137
2           163200  15422029895
3           164123  45492029154
4           166596  35422031737
5           232264  75372043938
6           232267  95462043655
7           233201  35422044775
8           233201  24613628530
9           233226  35412044812
10          233694  66074000105
11          235466  35422042863
12          272021  85422052947
13          272691  35422053340
14          272872  27536196938
15          273729  39456023149
16          284848  27536197195
17          284861  62672008054
18         1296477  35421853202
19         1296477  75841596436
20         2296518  97630194508

Expected result:

          changeme       name   qty  lookforme
0      35421853202  ItemName2  3.16    1296477
1      35422028768  ItemName1  7.94      62328
2      35422044775  ItemName3  5.56     233201
3      66074000105  ItemName4  6.88     233694

CodePudding user response:

Let's try the following:

Filter "lookforme" values that match "changeif=look4m" in df1 and find those values among "matchme-ascend" values in df2 using set_index loc. Since some of the "lookforme" values occur multiple times in df2, drop the duplicates and keep the first.

Then update "changeif=look4m" values in df1 using mask:

df1 = df1.set_index('changeif=look4m')
cond = df1['lookforme'] == df1.index
s = df2.set_index('matchme-ascend')['grabme'].loc[df1.index[cond]]
df1.index = df1.index.to_series().mask(cond, s[~s.index.duplicated()])
df1 = df1.reset_index()

Output:

   changeif=look4m       name   qty  lookforme
0      35421853202  ItemName2  3.16    1296477
1      35422028768  ItemName1  7.94      62328
2      35422044775  ItemName3  5.56     233201
3      66074000105  ItemName4  6.88     233694

CodePudding user response:

you can also try this:

res = pd.merge(df1, df2,
               left_on='lookforme', 
               right_on='matchme-ascend').drop_duplicates('lookforme').reset_index(drop=True)
# you have duplicates in df2 so we should-^^^^^^^^^^^^^^^

df1['changeif=look4m'] = res.apply(lambda x: x['grabme'] 
                                   if x['changeif=look4m']==x['lookforme'] 
                                   else x['changeif=look4m'], axis=1)
print(df1)
'''
   changeif=look4m       name   qty  lookforme
0      35421853202  ItemName2  3.16    1296477
1      35422028768  ItemName1  7.94      62328
2      35422044775  ItemName3  5.56     233201
3      66074000105  ItemName4  6.88     233694
  • Related