I have a pandas Dataframe with a Name column, for each Name, I would like to calculate the difference between the first value and each of the other values.
DataFrame
Name First Run Second Run Third Run Time
0 Jakub 315 279 276 9:00
1 Jakub 322 260 270 9:00
2 Jakub 248 260 213 15:00
3 Jakub 246 288 294 15:00
4 Jakub 276 274 263 15:00
5 Bene 222 268 246 9:00
6 Bene 236 302 263 9:00
7 Bene 208 245 224 15:00
8 Bene 283 245 245 15:00
9 Bene 278 257 249 15:00
first_values = df.groupby(['Name']).first()[['First Run','Second Run','Third Run']]
First Run Second Run Third Run Time
Name
Jakub 315 279 276 9:00
Bene 222 268 246 9:00
Output_df
Name First Run Second Run Third Run Time
0 Jakub 0 0 0 9:00
1 Jakub 7 -19 -6 9:00
2 Jakub -67 -19 -63 15:00
3 Jakub -69 9 18 15:00
4 Jakub -39 -5 -13 15:00
5 Bene 0 0 0 9:00
6 Bene 14 34 17 9:00
7 Bene -14 -23 22 15:00
8 Bene 61 -23 -1 15:00
9 Bene 56 -9 3 15:00
I can figure out the right function to get the desired output. I tried the below code but I am getting a TypeError.
df.Name.map(first_values.index).sub(first_values)
CodePudding user response:
You can use groupby
apply
and subtract the first row per group:
df.filter(like=' Run').groupby(df['Name']).apply(lambda g: g-g.iloc[0])
or from a manual list of columns:
cols = ['First Run','Second Run','Third Run']
df[cols].groupby(df['Name']).apply(lambda g: g-g.iloc[0])
full code to have a new dataframe:
df_output = df.copy()
df_output.update(df.filter(like=' Run')
.groupby(df['Name'])
.apply(lambda g: g-g.iloc[0])
)
output:
Name First Run Second Run Third Run Time
0 Jakub 0 0 0 9:00
1 Jakub 7 -19 -6 9:00
2 Jakub -67 -19 -63 15:00
3 Jakub -69 9 18 15:00
4 Jakub -39 -5 -13 15:00
5 Bene 0 0 0 9:00
6 Bene 14 34 17 9:00
7 Bene -14 -23 -22 15:00
8 Bene 61 -23 -1 15:00
9 Bene 56 -11 3 15:00
CodePudding user response:
You can check diff
with cumsum
df.update(df.groupby('Name')[['First Run','Second Run','Third Run']].apply(lambda x : x.diff().cumsum()).fillna(0))
Out[30]:
FirstRun SecondRun ThirdRun
0 0.0 0.0 0.0
1 7.0 -19.0 -6.0
2 -67.0 -19.0 -63.0
3 -69.0 9.0 18.0
4 -39.0 -5.0 -13.0
5 0.0 0.0 0.0
6 14.0 34.0 17.0
7 -14.0 -23.0 -22.0
8 61.0 -23.0 -1.0
9 56.0 -11.0 3.0
CodePudding user response:
Use GroupBy.transform
for repeat first values, so subtract working well:
cols = ['First Run','Second Run','Third Run']
df[cols] = df[cols].sub(df.groupby('Name')[cols].transform('first'))
print (df)
Name First Run Second Run Third Run Time
0 Jakub 0 0 0 9:00
1 Jakub 7 -19 -6 9:00
2 Jakub -67 -19 -63 15:00
3 Jakub -69 9 18 15:00
4 Jakub -39 -5 -13 15:00
5 Bene 0 0 0 9:00
6 Bene 14 34 17 9:00
7 Bene -14 -23 -22 15:00
8 Bene 61 -23 -1 15:00
9 Bene 56 -11 3 15:00