Home > front end >  Create new pandas column with apply based on conditions of multiple other columns
Create new pandas column with apply based on conditions of multiple other columns

Time:04-05

I have a df with time stamps in a basketball game (this is a sample. my actual df is much larger)

    year    period  p_sec_rem
0   2015    1   556
1   2013    1   455
2   2001    2   67
3   2008    3   177
4   2017    1   172

period is the period of the game and p_sec_remaining is the seconds remaining in the period. I would like to calculate how many seconds have elapsed since the beginning of the game (time) and have some logic that does this. It is a bit complicated, but explaining it is besides the point as I am sure it is correct:

df['time'] = None

def secondsPlayed(df):
  if df.year >= 2006:
    if df.period == 1:
      df['time'] = 600 - df.p_sec_rem
    elif df.period > 1 & df.period < 5:
      df['time'] = ((df.period * 10) * 60) - df.p_sec_rem
    elif df.period == 5:
      df['time'] = (((4 * 10) * 60)   300) - df.p_sec_rem
    elif df.period == 6:
      df['time'] = (((4 * 10) * 60)   600) - df.p_sec_rem
    elif df.period == 7:
      df['time'] = (((4 * 10) * 60)   900) - df.p_sec_rem
  elif df.year <= 2005:
    if df.period == 1:
      df['time'] = 1200 - df.p_sec_rem
    elif df.period == 2:
      df['time'] = 2400 - df.p_sec_rem
    elif df.period == 3:
      df['time'] = (((2 * 20) * 60)   300) - df.p_sec_rem
    elif df.period == 4:
      df['time'] = (((2 * 20) * 60)   600) - df.p_sec_rem
    elif df.period == 5:
      df['time'] = (((2 * 20) * 60)   900) - df.p_sec_rem
    elif df.period == 6:
      df['time'] = (((2 * 20) * 60)   1200) - df.p_sec_rem
    elif df.period == 7:
      df['time'] = (((2 * 20) * 60)   1500) - df.p_sec_rem

I would like to apply this logic to each row of my df, so I figured using *.apply() would do the trick, but alas:

df.apply(secondsPlayed,axis=1)

0    None
1    None
2    None
3    None
4    None

All None are returned, when my desired output is:

0    44
1    145
2    2333
3    1623
4    428

I came across this question but I think it is slightly different from my example since I need to pass an entire df to the function as I utilize multiple different columns for the conditions that determine the output df.time value. I suspect I am close, but have been playing around with this for a while to no avail.

CodePudding user response:

You just need to return the df in your function:

def secondsPlayed(df):
    if df.year >= 2006:
        if df.period == 1:
            df['time'] = 600 - df.p_sec_rem
        elif df.period > 1 & df.period < 5:
            df['time'] = ((df.period * 10) * 60) - df.p_sec_rem
        elif df.period == 5:
            df['time'] = (((4 * 10) * 60)   300) - df.p_sec_rem
        elif df.period == 6:
            df['time'] = (((4 * 10) * 60)   600) - df.p_sec_rem
        elif df.period == 7:
            df['time'] = (((4 * 10) * 60)   900) - df.p_sec_rem
    elif df.year <= 2005:
        if df.period == 1:
            df['time'] = 1200 - df.p_sec_rem
        elif df.period == 2:
            df['time'] = 2400 - df.p_sec_rem
        elif df.period == 3:
            df['time'] = (((2 * 20) * 60)   300) - df.p_sec_rem
        elif df.period == 4:
            df['time'] = (((2 * 20) * 60)   600) - df.p_sec_rem
        elif df.period == 5:
            df['time'] = (((2 * 20) * 60)   900) - df.p_sec_rem
        elif df.period == 6:
            df['time'] = (((2 * 20) * 60)   1200) - df.p_sec_rem
        elif df.period == 7:
            df['time'] = (((2 * 20) * 60)   1500) - df.p_sec_rem
    return df

output:

    year    period  p_sec_rem   time
0   2015         1      556     44
1   2013         1      455     145
2   2001         2       67     2333
3   2008         3      177     1623
4   2017         1      172     428

CodePudding user response:

As you seem to want to use this function and apply, I'll answer this part.

Your function should take a row as input and return a value, which it doesn't at the moment:

def secondsPlayed(row):
  if row.year >= 2006:
    if row.period == 1:
      return 600 - row.p_sec_rem
    elif row.period > 1 & row.period < 5:
      return ((row.period * 10) * 60) - row.p_sec_rem
    elif row.period == 5:
      return (((4 * 10) * 60)   300) - row.p_sec_rem
    elif row.period == 6:
      return (((4 * 10) * 60)   600) - row.p_sec_rem
    elif row.period == 7:
      return (((4 * 10) * 60)   900) - row.p_sec_rem
  elif row.year <= 2005:
    if row.period == 1:
      return 1200 - row.p_sec_rem
    elif row.period == 2:
      return 2400 - row.p_sec_rem
    elif row.period == 3:
      return (((2 * 20) * 60)   300) - row.p_sec_rem
    elif row.period == 4:
      return (((2 * 20) * 60)   600) - row.p_sec_rem
    elif row.period == 5:
      return (((2 * 20) * 60)   900) - row.p_sec_rem
    elif row.period == 6:
      return (((2 * 20) * 60)   1200) - row.p_sec_rem
    elif row.period == 7:
      return (((2 * 20) * 60)   1500) - row.p_sec_rem

Then use:

df['time'] = df.apply(secondsPlayed, axis=1)

output:

   year  period  p_sec_rem  time
0  2015       1        556    44
1  2013       1        455   145
2  2001       2         67  2333
3  2008       3        177  1623
4  2017       1        172   428

CodePudding user response:

It seems you could use vectorized operations instead of apply on axis=1. The idea is that before 2005, calculation of time depended on whether it's the first period or not. After 2006, it depended on whether it was before the 5th period or not:

import numpy as np
df['time'] = (np.where(df['year']<=2005, 
                      np.where(df['period']==1, 
                               1200, 1800   300*df['period']), 
                      np.where(df['period']<5, 
                               df['period']*600, 1200   300*df['period']))
              - df['p_sec_rem'])

Output:

   year  period  p_sec_rem  time
0  2015       1        556    44
1  2013       1        455   145
2  2001       2         67  2333
3  2008       3        177  1623
4  2017       1        172   428
  • Related