Home > Enterprise >  How can I find the difference between each row and the top row in each group?[python]
How can I find the difference between each row and the top row in each group?[python]

Time:09-16

This is a sample of my real dataset.

data = {'CarNo': [888, 81, 16, 62, 31, 23, 777, 23, 777, 31, 888, 62, 81, 16, 888, 31, 81, 62, 16, 23],
        'LapTime': [119628, 117170, 119846, 117178, 119516, 117915, 116964, 116683, 117169, 118919, 119227, 117789, 117674, 116914, 116333, 120027, 116403, 117351, 116557, 117107],
       'LAPS': [147.0, 147.0, 147.0, 147.0, 147.0, 146.0, 121.0, 56.0, 55.0, 55.0, 55.0, 54.0, 54.0, 54.0, 123.0, 123.0, 123.0, 123.0, 123.0, 122.0],
       'UpdateTime': [162047732, 162047732, 162047732, 162047732, 162047732, 162047732, 162047732, 125626988, 125626988, 125626988, 125626988, 125626988, 125626988, 125626988, 151851068, 151851068, 151851068, 151851068, 151851068, 151851068],
        'rank': [6.0, 2.0, 7.0, 3.0, 5.0, 4.0, 1.0, 1.0, 3.0, 6.0, 7.0, 5.0, 4.0, 2.0, 1.0, 7.0, 2.0, 5.0, 3.0, 4.0]
       }
df = pd.DataFrame(data)
df

I want to groupby according to the "UpdateTime" then find the gap between each row and Rank one in each group. From other questions here, I did something like bellow. but it is not very clever and I am also loosing my first row. Please help.

def diff_to_top(df,colA):
    pos=sub.columns.get_loc(colA)
    gap = sub.iloc[1:, pos] - sub.iat[0, pos]
    return gap

sub = df.set_index(['UpdateTime','rank','LAPS'])
sub.sort_index(inplace=True, ascending=True)

diff_to_top(sub,'LapTime')

CodePudding user response:

EDIT:

This one works without loop and works by applying a one-index-level subtrahend on a two-level-index dataframe. That way, the subtract() method applies the subtrahend on every row with matching "UpdateTime" index:

df2 = (
    df
    .reset_index()
    .set_index(["UpdateTime", "index"]) # Make a two-level index
    .sort_values(["UpdateTime", "rank"]) # Sort it
)

subtrahend = (
    df2
    .loc[df2["rank"] == 1, 'LapTime']
    .reset_index(level=1, drop=True) # Remaining index level is "UpdateTime"
)

print(subtrahend)
df2["Laptime_Diff"] = df2["LapTime"].subtract(subtrahend)
df2

Subtrahend looks like this (Index is UpdateTime, values from LapTime):

UpdateTime
162047732    116964
125626988    116683
151851068    116333

This yields:

                  CarNo  LapTime   LAPS  rank  Laptime_Diff
UpdateTime index                                           
125626988  7         23   116683   56.0   1.0             0
           13        16   116914   54.0   2.0           231
           8        777   117169   55.0   3.0           486
           12        81   117674   54.0   4.0           991
           11        62   117789   54.0   5.0          1106
           9         31   118919   55.0   6.0          2236
           10       888   119227   55.0   7.0          2544
151851068  14       888   116333  123.0   1.0             0
           16        81   116403  123.0   2.0            70
           18        16   116557  123.0   3.0           224
           19        23   117107  122.0   4.0           774
...

Old solution (using a groupby loop):

dfDict = {}
for name, g in df.groupby("UpdateTime"):
    subtrahend = g["LapTime"].loc[g["rank"] == 1].squeeze()
    dfDict[name] = (
        g["LapTime"]
        .subtract(subtrahend)
        .reset_index().set_index("level_0")
    )

laptimediff = pd.concat(dfDict)

df2 = df.reset_index().set_index(["UpdateTime", "level_0"])
df2["LapTime_diff"] = laptimediff["LapTime"]
print(df2)

Old solutions yields:

                    CarNo  LapTime   LAPS  rank  LapTime_diff
UpdateTime level_0                                           
162047732  0          888   119628  147.0   6.0          2664
           1           81   117170  147.0   2.0           206
           2           16   119846  147.0   7.0          2882
           3           62   117178  147.0   3.0           214
           4           31   119516  147.0   5.0          2552
           5           23   117915  146.0   4.0           951
           6          777   116964  121.0   1.0             0
125626988  7           23   116683   56.0   1.0             0
           8          777   117169   55.0   3.0           486
           9           31   118919   55.0   6.0          2236
  • Related