This is a sample of my real dataset.
data = {'CarNo': [888, 81, 16, 62, 31, 23, 777, 23, 777, 31, 888, 62, 81, 16, 888, 31, 81, 62, 16, 23],
'LapTime': [119628, 117170, 119846, 117178, 119516, 117915, 116964, 116683, 117169, 118919, 119227, 117789, 117674, 116914, 116333, 120027, 116403, 117351, 116557, 117107],
'LAPS': [147.0, 147.0, 147.0, 147.0, 147.0, 146.0, 121.0, 56.0, 55.0, 55.0, 55.0, 54.0, 54.0, 54.0, 123.0, 123.0, 123.0, 123.0, 123.0, 122.0],
'UpdateTime': [162047732, 162047732, 162047732, 162047732, 162047732, 162047732, 162047732, 125626988, 125626988, 125626988, 125626988, 125626988, 125626988, 125626988, 151851068, 151851068, 151851068, 151851068, 151851068, 151851068],
'rank': [6.0, 2.0, 7.0, 3.0, 5.0, 4.0, 1.0, 1.0, 3.0, 6.0, 7.0, 5.0, 4.0, 2.0, 1.0, 7.0, 2.0, 5.0, 3.0, 4.0]
}
df = pd.DataFrame(data)
df
I want to groupby according to the "UpdateTime" then find the gap between each row and Rank one in each group. From other questions here, I did something like bellow. but it is not very clever and I am also loosing my first row. Please help.
def diff_to_top(df,colA):
pos=sub.columns.get_loc(colA)
gap = sub.iloc[1:, pos] - sub.iat[0, pos]
return gap
sub = df.set_index(['UpdateTime','rank','LAPS'])
sub.sort_index(inplace=True, ascending=True)
diff_to_top(sub,'LapTime')
CodePudding user response:
EDIT:
This one works without loop and works by applying a one-index-level subtrahend on a two-level-index dataframe. That way, the subtract() method applies the subtrahend on every row with matching "UpdateTime" index:
df2 = (
df
.reset_index()
.set_index(["UpdateTime", "index"]) # Make a two-level index
.sort_values(["UpdateTime", "rank"]) # Sort it
)
subtrahend = (
df2
.loc[df2["rank"] == 1, 'LapTime']
.reset_index(level=1, drop=True) # Remaining index level is "UpdateTime"
)
print(subtrahend)
df2["Laptime_Diff"] = df2["LapTime"].subtract(subtrahend)
df2
Subtrahend looks like this (Index is UpdateTime, values from LapTime):
UpdateTime
162047732 116964
125626988 116683
151851068 116333
This yields:
CarNo LapTime LAPS rank Laptime_Diff
UpdateTime index
125626988 7 23 116683 56.0 1.0 0
13 16 116914 54.0 2.0 231
8 777 117169 55.0 3.0 486
12 81 117674 54.0 4.0 991
11 62 117789 54.0 5.0 1106
9 31 118919 55.0 6.0 2236
10 888 119227 55.0 7.0 2544
151851068 14 888 116333 123.0 1.0 0
16 81 116403 123.0 2.0 70
18 16 116557 123.0 3.0 224
19 23 117107 122.0 4.0 774
...
Old solution (using a groupby loop):
dfDict = {}
for name, g in df.groupby("UpdateTime"):
subtrahend = g["LapTime"].loc[g["rank"] == 1].squeeze()
dfDict[name] = (
g["LapTime"]
.subtract(subtrahend)
.reset_index().set_index("level_0")
)
laptimediff = pd.concat(dfDict)
df2 = df.reset_index().set_index(["UpdateTime", "level_0"])
df2["LapTime_diff"] = laptimediff["LapTime"]
print(df2)
Old solutions yields:
CarNo LapTime LAPS rank LapTime_diff
UpdateTime level_0
162047732 0 888 119628 147.0 6.0 2664
1 81 117170 147.0 2.0 206
2 16 119846 147.0 7.0 2882
3 62 117178 147.0 3.0 214
4 31 119516 147.0 5.0 2552
5 23 117915 146.0 4.0 951
6 777 116964 121.0 1.0 0
125626988 7 23 116683 56.0 1.0 0
8 777 117169 55.0 3.0 486
9 31 118919 55.0 6.0 2236