lowest value subtract second lowest value within a group (panda, python,groupby)-CodePudding

i have a df which i'm able to get laptime difference between the lowest laptime substract
and the remaining laptime within the groupby with

df['diff'] = df['lap_time'] - df.groupby('lap_ref')['lap_time '].transform('min')

original df

╔═════════╦═════════╦══════════╦══════╗
║ lap_ref ║ swimmer ║ lap_time ║ diff ║
╠═════════╬═════════╬══════════╬══════╣
║     151 ║ steve   ║ 82.64    ║ 0    ║
║     151 ║ timoy   ║ 82.77    ║ 0.13 ║
║     151 ║ audi    ║ 82.83    ║ 0.19 ║
║     151 ║ gimmi   ║ 82.98    ║ 0.34 ║
║     151 ║ pete    ║ 83.08    ║ 0.44 ║
║     151 ║ manson  ║ 83.24    ║ 0.6  ║
║     151 ║ fuller  ║ 83.4     ║ 0.76 ║
║     151 ║ ron     ║ 83.56    ║ 0.92 ║
║     151 ║ limin   ║ 83.62    ║ 0.98 ║
║     151 ║ octank  ║ 83.92    ║ 1.28 ║
║     151 ║ frank   ║ 83.94    ║ 1.3  ║
║     151 ║ mose    ║ 84.15    ║ 1.51 ║
║     151 ║ preta   ║ 84.2     ║ 1.56 ║
║     151 ║ landra  ║ 92.2     ║ 9.56 ║
╚═════════╩═════════╩══════════╩══════╝

i need to fill the value 0 with the lowest - second lowest value within the group. however, i can't find a solution.

result trying to get

╔═════════╦═════════╦══════════╦══════╗
║ lap_ref ║ swimmer ║ lap_time ║ diff ║
╠═════════╬═════════╬══════════╬══════╣
║     151 ║ steve   ║ 82.64    ║-0.13 ║
║     151 ║ timoy   ║ 82.77    ║ 0.13 ║
║     151 ║ audi    ║ 82.83    ║ 0.19 ║
║     151 ║ gimmi   ║ 82.98    ║ 0.34 ║
║     151 ║ pete    ║ 83.08    ║ 0.44 ║
║     151 ║ manson  ║ 83.24    ║ 0.6  ║
║     151 ║ fuller  ║ 83.4     ║ 0.76 ║
║     151 ║ ron     ║ 83.56    ║ 0.92 ║
║     151 ║ limin   ║ 83.62    ║ 0.98 ║
║     151 ║ octank  ║ 83.92    ║ 1.28 ║
║     151 ║ frank   ║ 83.94    ║ 1.3  ║
║     151 ║ mose    ║ 84.15    ║ 1.51 ║
║     151 ║ preta   ║ 84.2     ║ 1.56 ║
║     151 ║ landra  ║ 92.2     ║ 9.56 ║
╚═════════╩═════════╩══════════╩══════╝

data = {'lap_ref':[151,151,151,151,151,151,151,151,151,151,151,151,151,151],
        'swimmer':['steve','timoy','audi','gimmi','pete','manson','fuller',
                    'ron','limin','octank','frank','mose','preta','landra'],
        'lap_time':[82.64,82.77,82.83,82.98,83.08,83.24,83.4,83.56,83.62,83.92,83.94,84.15,84.2,92.2]}

kindly advise. thank you

CodePudding user response：

We can do with pd.Series.nsmallest then map the value back

s = df.groupby('lap_ref ')['lap_time'].apply(pd.Series.nsmallest,n=2).groupby(level=0).agg(np.ptp).reindex(df['lap_ref ']).values
df['new'] = np.where(df['diff'].eq(0),-s,df['diff'])

CodePudding user response：

You can use nsmallest method. First find the 2 smallest values using .nsmallest method and find the difference between the two using .diff method. This produces the difference between the second smallest and the smallest for each lap_ref. Since we want the difference between the second smallest and the smallest, we take the max (since the other value is NaN). Then convert this to a dictionary and using the dictionary, map it to df['lap_ref'] and using np.where, assign this value to diff where df['diff']==0:

g = df.groupby('lap_ref')['lap_time']
df['diff'] = df['lap_time'] - g.transform('min')

mapper = g.nsmallest(2).diff().droplevel(1).groupby('lap_ref').max().to_dict()
df['diff'] = np.where(df['diff']==0, -df['lap_ref'].map(mapper), df['diff'])

Output:

    lap_ref swimmer  lap_time  diff
0       151   steve     82.64 -0.13
1       151   timoy     82.77  0.13
2       151    audi     82.83  0.19
3       151   gimmi     82.98  0.34
4       151    pete     83.08  0.44
5       151  manson     83.24  0.60
6       151  fuller     83.40  0.76
7       151     ron     83.56  0.92
8       151   limin     83.62  0.98
9       151  octank     83.92  1.28
10      151   frank     83.94  1.30
11      151    mose     84.15  1.51
12      151   preta     84.20  1.56
13      151  landra     92.20  9.56

CodePudding user response：

df['diff'] = df['lap_time'].diff().cumsum().fillna(-df['lap_time'].diff()[1])

Output:

>>> df
0    -0.13
1     0.13
2     0.19
3     0.34
4     0.44
5     0.60
6     0.76
7     0.92
8     0.98
9     1.28
10    1.30
11    1.51
12    1.56
13    9.56
Name: lap_time, dtype: float64

CodePudding user response：

Assuming the data is already sorted in descending order (as is currently in the shared data):

 grp = df.groupby('lap_ref').lap_time
 difference = grp.nth(0) - grp.nth(1)
(df.assign(lap_time = df.lap_time.sub(grp.transform('min')), 
           diff = lambda df: np.where(df.lap_time.eq(0), 
                                      difference.item(), 
                                      df.lap_time)
           )
)
    lap_ref swimmer  lap_time  diff
0       151   steve      0.00 -0.13
1       151   timoy      0.13  0.13
2       151    audi      0.19  0.19
3       151   gimmi      0.34  0.34
4       151    pete      0.44  0.44
5       151  manson      0.60  0.60
6       151  fuller      0.76  0.76
7       151     ron      0.92  0.92
8       151   limin      0.98  0.98
9       151  octank      1.28  1.28
10      151   frank      1.30  1.30
11      151    mose      1.51  1.51
12      151   preta      1.56  1.56
13      151  landra      9.56  9.56