I want to update values on a pandas df row based on other rows.
I have a dataframe with patients, all the vaccines they need to get, a column that indicates if they have already applied it or not, and a "status" column, that is "Ok" if they applied the vaccine, or "Missing" if they don't.
The problem is that there are vaccines that are equivalent. In the example below, vaccines B and C are equivalent. So if the patient applied vaccine B, he shouldn't apply vaccine C. Therefore, the correct status in this case is "Ok" for vaccine B and "Applied equivalent vaccine" for vaccine C.
The dataframe is over 2 million rows, so I need an effient way to build the "desired_status" column.
Any ideas how to do that? Thanks!
patient | vaccine | applied? | status | desired_status |
---|---|---|---|---|
1 | A | 1 | Ok | Ok |
1 | B | 1 | Ok | Ok |
1 | C | 0 | Missing | Applied equivalent vaccine |
2 | A | 0 | Missing | Missing |
2 | B | 0 | Missing | Applied equivalent vaccine |
2 | C | 1 | Ok | Ok |
3 | A | 1 | Ok | Ok |
3 | B | 0 | Missing | Missing |
3 | C | 0 | Missing | Missing |
PS: Equivalent vaccines are not necessarily in adjacent rows.
CodePudding user response:
You can use a dictionary to map equivalents, then use groupby
agg
and numpy.select
:
eq_vaccines = {'B': 'BC', 'C': 'BC'}
eq = df['vaccine'].map(eq_vaccines).combine_first(df['vaccine'])
applied_any = df.groupby(['patient', eq])['applied?'].transform('max').eq(1)
import numpy as np
df['status'] = np.select([df['applied?'].eq(1), applied_any],
['Ok', 'Applied equivalent vaccine'], 'Missing')
output:
patient vaccine applied? status
0 1 A 1 Ok
1 1 B 1 Ok
2 1 C 0 Applied equivalent vaccine
3 2 A 0 Missing
4 2 B 0 Applied equivalent vaccine
5 2 C 1 Ok
6 3 A 1 Ok
7 3 B 0 Missing
8 3 C 0 Missing
table with intermediates for clarity
patient vaccine applied? status eq applied_any
0 1 A 1 Ok A 1
1 1 B 1 Ok BC 1
2 1 C 0 Applied equivalent vaccine BC 1
3 2 A 0 Missing A 0
4 2 B 0 Applied equivalent vaccine BC 1
5 2 C 1 Ok BC 1
6 3 A 1 Ok A 1
7 3 B 0 Missing BC 0
8 3 C 0 Missing BC 0