I am trying to apply a lambda function to a DataFrame where I check multiple conditions.
The relevant columns in the DataFrame:
shooter_id | shot_made | player_id_1 | total_points | free_throw_id | free_throw_made | time |
---|---|---|---|---|---|---|
NaN | NaN | 42 | NaN | NaN | NaN | 1 |
42 | True | 42 | 2 | NaN | NaN | 2 |
30 | True | 42 | 2 | NaN | NaN | 3 |
NaN | NaN | 42 | NaN | 42 | True | 3 |
NaN | NaN | 42 | NaN | 42 | False | 4 |
42 | True | 42 | 5 | NaN | NaN | 5 |
I want to add a column to the DataFrame that has the most recent total_points values, while also adding 1 to the rows where the free_throw_made = True, as the total_points does not reflect these...
shooter_id | shot_made | player_id_1 | total_points | free_throw_id | free_throw_made | time | player_id_1_total_points |
---|---|---|---|---|---|---|---|
NaN | NaN | 42 | NaN | NaN | NaN | 1 | 0 |
42 | True | 42 | 2 | NaN | NaN | 2 | 2 |
30 | True | 42 | 2 | NaN | NaN | 3 | 2 |
NaN | NaN | 42 | NaN | 42 | True | 3 | 3 |
NaN | NaN | 42 | NaN | 42 | False | 4 | 3 |
42 | True | 42 | 5 | NaN | NaN | 5 | 5 |
I've tried a few different bits of code, but I can't work out the proper logic or syntax.
For example, I ran this:
def points(x):
if x == df['shooter_id'] & df['shot_made']:
return df['total_points']
elif x == df['free_throw_id'] & df['free_throw_made']:
df['total_points'] = 1
else:
return df['total_points']
df['player_id_1_total_points'] = df['player_id_1'].apply(lambda x: points(x))
Not only did this return an error (unsupported operand types float and bool), as I'm writing this I'm also realizing it would not return the most recent total_points value...
Any guidance would be extremely appreciated!
CodePudding user response:
First group the dataframe by player_id_1
then for each group forward fill the values in total_points
then add with free_throw_made
, finally take the cumulative max to calculate most recent total points:
def func(g):
return (
g['total_points'].ffill()
.add(g['free_throw_made'], fill_value=0)
.fillna(0).cummax().to_frame()
)
df['player_id_1_total_points'] = df.groupby('player_id_1').apply(func)
Result
shooter_id shot_made player_id_1 total_points free_throw_id free_throw_made time player_id_1_total_points
0 NaN NaN 42 NaN NaN NaN 1 0.0
1 42.0 True 42 2.0 NaN NaN 2 2.0
2 30.0 True 42 2.0 NaN NaN 3 2.0
3 NaN NaN 42 NaN 42.0 True 3 3.0
4 NaN NaN 42 NaN 42.0 False 4 3.0
5 42.0 True 42 5.0 NaN NaN 5 5.0
CodePudding user response:
You don't need a function, you can directly use vectorial code:
g = df.assign(free_throw_made=df['free_throw_made'].eq(True)).groupby('player_id_1')
df['player_id_1_total_points'] = (g['total_points'].ffill().fillna(0, downcast='infer')
g['free_throw_made'].cumsum()
)
output:
shooter_id shot_made player_id_1 total_points free_throw_id \
0 NaN NaN 42 NaN NaN
1 42.0 True 42 2.0 NaN
2 30.0 True 42 2.0 NaN
3 NaN NaN 42 NaN 42.0
4 NaN NaN 42 NaN 42.0
5 42.0 True 42 5.0 NaN
free_throw_made time player_id_1_total_points
0 NaN 1 0
1 NaN 2 2
2 NaN 3 2
3 True 3 3
4 False 4 3
5 NaN 5 6