Home > other >  how to lookup between two dataframes and update the value in another df
how to lookup between two dataframes and update the value in another df

Time:11-03

I have a dataframe df1 like below -

|email_id| date |
|[email protected] | ['2022-04-09'] |
|[email protected] | [nan]
|[email protected] | ['2022-09-21','2022-03-09'] |
|[email protected] | [nan, '2022-03-29'] |
|[email protected] | [nan] |
|[email protected] | [nan,'2022-09-01']

Another df df2 -

|email_id| status |
|[email protected] | 0 |
|[email protected] | 0 |
|[email protected] | 0 |
|[email protected] | 3 |
|[email protected] | 2 |
|[email protected] | 1 |

How can I lookup email_id from df2 in df1 and update the status in df2? If we have the date values present in df1 date column , status for that email_id should be 0, and if we have any nan values present, the status should be 1. If some email_id from df2 doesn't match in df1 , will keep the status as same.

Expected output of df2 -

|email_id| status |
|[email protected] | 1 |
|[email protected] | 0 |
|[email protected] | 1 |
|[email protected] | 3 |
|[email protected] | 2 |
|[email protected] | 1 |

Please help me out. Thanks in advance!

CodePudding user response:

First use DataFrame.explode for column from lists, then create compare for missing values with aggregate max for mapping Series, use Series.map with replace non matched values to original column df2['status']:

df = df1.explode('date')
s = df['date'].isna().astype(int).groupby(df['email_id'].str.lower()).max()
print (s)
email_id
[email protected]    1
[email protected]    0
[email protected]    1
[email protected]    1
[email protected]    1
Name: date, dtype: int32

df2['status'] = df2['email_id'].str.lower().map(s).fillna(df2['status']).astype(int)
print (df2)
        email_id  status
0  [email protected]       1
1  [email protected]       0
2  [email protected]       1
3  [email protected]       3
4  [email protected]       2
5  [email protected]       1
  • Related