I have a dataframe df1
like below -
|email_id| date |
|[email protected] | ['2022-04-09'] |
|[email protected] | [nan]
|[email protected] | ['2022-09-21','2022-03-09'] |
|[email protected] | [nan, '2022-03-29'] |
|[email protected] | [nan] |
|[email protected] | [nan,'2022-09-01']
Another df df2
-
|email_id| status |
|[email protected] | 0 |
|[email protected] | 0 |
|[email protected] | 0 |
|[email protected] | 3 |
|[email protected] | 2 |
|[email protected] | 1 |
How can I lookup email_id from df2 in df1 and update the status in df2? If we have the date values present in df1 date column , status for that email_id should be 0, and if we have any nan values present, the status should be 1. If some email_id from df2 doesn't match in df1 , will keep the status as same.
Expected output of df2 -
|email_id| status |
|[email protected] | 1 |
|[email protected] | 0 |
|[email protected] | 1 |
|[email protected] | 3 |
|[email protected] | 2 |
|[email protected] | 1 |
Please help me out. Thanks in advance!
CodePudding user response:
First use DataFrame.explode
for column from lists, then create compare for missing values with aggregate max
for mapping Series, use Series.map
with replace non matched values to original column df2['status']
:
df = df1.explode('date')
s = df['date'].isna().astype(int).groupby(df['email_id'].str.lower()).max()
print (s)
email_id
[email protected] 1
[email protected] 0
[email protected] 1
[email protected] 1
[email protected] 1
Name: date, dtype: int32
df2['status'] = df2['email_id'].str.lower().map(s).fillna(df2['status']).astype(int)
print (df2)
email_id status
0 [email protected] 1
1 [email protected] 0
2 [email protected] 1
3 [email protected] 3
4 [email protected] 2
5 [email protected] 1