I have a df
with time stamps in a basketball game (this is a sample. my actual df is much larger)
year period p_sec_rem
0 2015 1 556
1 2013 1 455
2 2001 2 67
3 2008 3 177
4 2017 1 172
period
is the period of the game and p_sec_remaining
is the seconds remaining in the period
. I would like to calculate how many seconds have elapsed since the beginning of the game (time
) and have some logic that does this. It is a bit complicated, but explaining it is besides the point as I am sure it is correct:
df['time'] = None
def secondsPlayed(df):
if df.year >= 2006:
if df.period == 1:
df['time'] = 600 - df.p_sec_rem
elif df.period > 1 & df.period < 5:
df['time'] = ((df.period * 10) * 60) - df.p_sec_rem
elif df.period == 5:
df['time'] = (((4 * 10) * 60) 300) - df.p_sec_rem
elif df.period == 6:
df['time'] = (((4 * 10) * 60) 600) - df.p_sec_rem
elif df.period == 7:
df['time'] = (((4 * 10) * 60) 900) - df.p_sec_rem
elif df.year <= 2005:
if df.period == 1:
df['time'] = 1200 - df.p_sec_rem
elif df.period == 2:
df['time'] = 2400 - df.p_sec_rem
elif df.period == 3:
df['time'] = (((2 * 20) * 60) 300) - df.p_sec_rem
elif df.period == 4:
df['time'] = (((2 * 20) * 60) 600) - df.p_sec_rem
elif df.period == 5:
df['time'] = (((2 * 20) * 60) 900) - df.p_sec_rem
elif df.period == 6:
df['time'] = (((2 * 20) * 60) 1200) - df.p_sec_rem
elif df.period == 7:
df['time'] = (((2 * 20) * 60) 1500) - df.p_sec_rem
I would like to apply this logic to each row of my df
, so I figured using *.apply()
would do the trick, but alas:
df.apply(secondsPlayed,axis=1)
0 None
1 None
2 None
3 None
4 None
All None
are returned, when my desired output is:
0 44
1 145
2 2333
3 1623
4 428
I came across this question but I think it is slightly different from my example since I need to pass an entire df to the function as I utilize multiple different columns for the conditions that determine the output df.time
value. I suspect I am close, but have been playing around with this for a while to no avail.
CodePudding user response:
You just need to return the df
in your function:
def secondsPlayed(df):
if df.year >= 2006:
if df.period == 1:
df['time'] = 600 - df.p_sec_rem
elif df.period > 1 & df.period < 5:
df['time'] = ((df.period * 10) * 60) - df.p_sec_rem
elif df.period == 5:
df['time'] = (((4 * 10) * 60) 300) - df.p_sec_rem
elif df.period == 6:
df['time'] = (((4 * 10) * 60) 600) - df.p_sec_rem
elif df.period == 7:
df['time'] = (((4 * 10) * 60) 900) - df.p_sec_rem
elif df.year <= 2005:
if df.period == 1:
df['time'] = 1200 - df.p_sec_rem
elif df.period == 2:
df['time'] = 2400 - df.p_sec_rem
elif df.period == 3:
df['time'] = (((2 * 20) * 60) 300) - df.p_sec_rem
elif df.period == 4:
df['time'] = (((2 * 20) * 60) 600) - df.p_sec_rem
elif df.period == 5:
df['time'] = (((2 * 20) * 60) 900) - df.p_sec_rem
elif df.period == 6:
df['time'] = (((2 * 20) * 60) 1200) - df.p_sec_rem
elif df.period == 7:
df['time'] = (((2 * 20) * 60) 1500) - df.p_sec_rem
return df
output:
year period p_sec_rem time
0 2015 1 556 44
1 2013 1 455 145
2 2001 2 67 2333
3 2008 3 177 1623
4 2017 1 172 428
CodePudding user response:
As you seem to want to use this function and apply
, I'll answer this part.
Your function should take a row as input and return
a value, which it doesn't at the moment:
def secondsPlayed(row):
if row.year >= 2006:
if row.period == 1:
return 600 - row.p_sec_rem
elif row.period > 1 & row.period < 5:
return ((row.period * 10) * 60) - row.p_sec_rem
elif row.period == 5:
return (((4 * 10) * 60) 300) - row.p_sec_rem
elif row.period == 6:
return (((4 * 10) * 60) 600) - row.p_sec_rem
elif row.period == 7:
return (((4 * 10) * 60) 900) - row.p_sec_rem
elif row.year <= 2005:
if row.period == 1:
return 1200 - row.p_sec_rem
elif row.period == 2:
return 2400 - row.p_sec_rem
elif row.period == 3:
return (((2 * 20) * 60) 300) - row.p_sec_rem
elif row.period == 4:
return (((2 * 20) * 60) 600) - row.p_sec_rem
elif row.period == 5:
return (((2 * 20) * 60) 900) - row.p_sec_rem
elif row.period == 6:
return (((2 * 20) * 60) 1200) - row.p_sec_rem
elif row.period == 7:
return (((2 * 20) * 60) 1500) - row.p_sec_rem
Then use:
df['time'] = df.apply(secondsPlayed, axis=1)
output:
year period p_sec_rem time
0 2015 1 556 44
1 2013 1 455 145
2 2001 2 67 2333
3 2008 3 177 1623
4 2017 1 172 428
CodePudding user response:
It seems you could use vectorized operations instead of apply on axis=1
. The idea is that before 2005, calculation of time
depended on whether it's the first period or not. After 2006, it depended on whether it was before the 5th period or not:
import numpy as np
df['time'] = (np.where(df['year']<=2005,
np.where(df['period']==1,
1200, 1800 300*df['period']),
np.where(df['period']<5,
df['period']*600, 1200 300*df['period']))
- df['p_sec_rem'])
Output:
year period p_sec_rem time
0 2015 1 556 44
1 2013 1 455 145
2 2001 2 67 2333
3 2008 3 177 1623
4 2017 1 172 428