Python: Lambda Function to Return Values Based on Multiple Conditions-CodePudding

I am trying to apply a lambda function to a DataFrame where I check multiple conditions.

The relevant columns in the DataFrame:

shooter_id	shot_made	player_id_1	total_points	free_throw_id	free_throw_made	time
NaN	NaN	42	NaN	NaN	NaN	1
42	True	42	2	NaN	NaN	2
30	True	42	2	NaN	NaN	3
NaN	NaN	42	NaN	42	True	3
NaN	NaN	42	NaN	42	False	4
42	True	42	5	NaN	NaN	5

I want to add a column to the DataFrame that has the most recent total_points values, while also adding 1 to the rows where the free_throw_made = True, as the total_points does not reflect these...

shooter_id	shot_made	player_id_1	total_points	free_throw_id	free_throw_made	time	player_id_1_total_points
NaN	NaN	42	NaN	NaN	NaN	1	0
42	True	42	2	NaN	NaN	2	2
30	True	42	2	NaN	NaN	3	2
NaN	NaN	42	NaN	42	True	3	3
NaN	NaN	42	NaN	42	False	4	3
42	True	42	5	NaN	NaN	5	5

I've tried a few different bits of code, but I can't work out the proper logic or syntax.

For example, I ran this:

def points(x):
        if x == df['shooter_id'] & df['shot_made']:
            return df['total_points']
        elif x == df['free_throw_id'] & df['free_throw_made']:
            df['total_points']  = 1 
        else:
            return df['total_points']

df['player_id_1_total_points'] = df['player_id_1'].apply(lambda x: points(x))

Not only did this return an error (unsupported operand types float and bool), as I'm writing this I'm also realizing it would not return the most recent total_points value...

Any guidance would be extremely appreciated!

CodePudding user response：

First group the dataframe by player_id_1 then for each group forward fill the values in total_points then add with free_throw_made, finally take the cumulative max to calculate most recent total points:

def func(g):
    return (
        g['total_points'].ffill()
        .add(g['free_throw_made'], fill_value=0)
        .fillna(0).cummax().to_frame()
    )


df['player_id_1_total_points'] = df.groupby('player_id_1').apply(func)

Result

   shooter_id shot_made  player_id_1  total_points  free_throw_id free_throw_made  time  player_id_1_total_points
0         NaN       NaN           42           NaN            NaN             NaN     1                       0.0
1        42.0      True           42           2.0            NaN             NaN     2                       2.0
2        30.0      True           42           2.0            NaN             NaN     3                       2.0
3         NaN       NaN           42           NaN           42.0            True     3                       3.0
4         NaN       NaN           42           NaN           42.0           False     4                       3.0
5        42.0      True           42           5.0            NaN             NaN     5                       5.0

CodePudding user response：

You don't need a function, you can directly use vectorial code:

g = df.assign(free_throw_made=df['free_throw_made'].eq(True)).groupby('player_id_1')
df['player_id_1_total_points'] = (g['total_points'].ffill().fillna(0, downcast='infer')
                                   g['free_throw_made'].cumsum()
                                 )

output:

   shooter_id shot_made  player_id_1  total_points  free_throw_id  \
0         NaN       NaN           42           NaN            NaN   
1        42.0      True           42           2.0            NaN   
2        30.0      True           42           2.0            NaN   
3         NaN       NaN           42           NaN           42.0   
4         NaN       NaN           42           NaN           42.0   
5        42.0      True           42           5.0            NaN   

  free_throw_made  time  player_id_1_total_points  
0             NaN     1                         0  
1             NaN     2                         2  
2             NaN     3                         2  
3            True     3                         3  
4           False     4                         3  
5             NaN     5                         6