Home > database >  How to update a row based on other rows in pandas in an efficient way
How to update a row based on other rows in pandas in an efficient way

Time:11-17

I want to update values on a pandas df row based on other rows.

I have a dataframe with patients, all the vaccines they need to get, a column that indicates if they have already applied it or not, and a "status" column, that is "Ok" if they applied the vaccine, or "Missing" if they don't.

The problem is that there are vaccines that are equivalent. In the example below, vaccines B and C are equivalent. So if the patient applied vaccine B, he shouldn't apply vaccine C. Therefore, the correct status in this case is "Ok" for vaccine B and "Applied equivalent vaccine" for vaccine C.

The dataframe is over 2 million rows, so I need an effient way to build the "desired_status" column.

Any ideas how to do that? Thanks!

patient vaccine applied? status desired_status
1 A 1 Ok Ok
1 B 1 Ok Ok
1 C 0 Missing Applied equivalent vaccine
2 A 0 Missing Missing
2 B 0 Missing Applied equivalent vaccine
2 C 1 Ok Ok
3 A 1 Ok Ok
3 B 0 Missing Missing
3 C 0 Missing Missing

PS: Equivalent vaccines are not necessarily in adjacent rows.

CodePudding user response:

You can use a dictionary to map equivalents, then use groupby agg and numpy.select:

eq_vaccines = {'B': 'BC', 'C': 'BC'}

eq = df['vaccine'].map(eq_vaccines).combine_first(df['vaccine'])
applied_any = df.groupby(['patient', eq])['applied?'].transform('max').eq(1)

import numpy as np
df['status'] = np.select([df['applied?'].eq(1), applied_any],
                         ['Ok', 'Applied equivalent vaccine'], 'Missing')

output:

   patient vaccine  applied?                      status
0        1       A         1                          Ok
1        1       B         1                          Ok
2        1       C         0  Applied equivalent vaccine
3        2       A         0                     Missing
4        2       B         0  Applied equivalent vaccine
5        2       C         1                          Ok
6        3       A         1                          Ok
7        3       B         0                     Missing
8        3       C         0                     Missing
table with intermediates for clarity
   patient vaccine  applied?                      status  eq  applied_any
0        1       A         1                          Ok   A            1
1        1       B         1                          Ok  BC            1
2        1       C         0  Applied equivalent vaccine  BC            1
3        2       A         0                     Missing   A            0
4        2       B         0  Applied equivalent vaccine  BC            1
5        2       C         1                          Ok  BC            1
6        3       A         1                          Ok   A            1
7        3       B         0                     Missing  BC            0
8        3       C         0                     Missing  BC            0
  • Related