For unique names, get the first value and subract from the rest value pandas dataframe-CodePudding

I have a pandas Dataframe with a Name column, for each Name, I would like to calculate the difference between the first value and each of the other values.

DataFrame

        Name   First Run    Second Run  Third Run   Time
0      Jakub    315            279        276       9:00
1      Jakub    322            260        270       9:00
2      Jakub    248            260        213       15:00
3      Jakub    246            288        294       15:00
4      Jakub    276            274        263       15:00
5      Bene     222            268        246       9:00
6      Bene     236            302        263       9:00
7      Bene     208            245        224       15:00
8      Bene     283            245        245       15:00
9      Bene     278            257        249       15:00

first_values = df.groupby(['Name']).first()[['First Run','Second Run','Third Run']]

         First Run        Second Run    Third Run   Time
Name            
Jakub       315             279          276        9:00
Bene        222             268          246        9:00

Output_df

        Name   First Run    Second Run  Third Run   Time
0      Jakub    0              0           0        9:00
1      Jakub    7              -19        -6        9:00
2      Jakub    -67            -19        -63       15:00
3      Jakub    -69            9          18        15:00
4      Jakub    -39            -5         -13       15:00
5      Bene     0              0          0         9:00
6      Bene     14             34         17        9:00
7      Bene     -14            -23        22        15:00
8      Bene     61             -23        -1        15:00
9      Bene     56             -9         3         15:00

I can figure out the right function to get the desired output. I tried the below code but I am getting a TypeError.

df.Name.map(first_values.index).sub(first_values)

CodePudding user response：

You can use groupby apply and subtract the first row per group:

df.filter(like=' Run').groupby(df['Name']).apply(lambda g: g-g.iloc[0])

or from a manual list of columns:

cols = ['First Run','Second Run','Third Run']
df[cols].groupby(df['Name']).apply(lambda g: g-g.iloc[0])

full code to have a new dataframe:

df_output = df.copy()
df_output.update(df.filter(like=' Run')
                   .groupby(df['Name'])
                   .apply(lambda g: g-g.iloc[0])
                 )

output:

    Name  First Run  Second Run  Third Run   Time
0  Jakub          0           0          0   9:00
1  Jakub          7         -19         -6   9:00
2  Jakub        -67         -19        -63  15:00
3  Jakub        -69           9         18  15:00
4  Jakub        -39          -5        -13  15:00
5   Bene          0           0          0   9:00
6   Bene         14          34         17   9:00
7   Bene        -14         -23        -22  15:00
8   Bene         61         -23         -1  15:00
9   Bene         56         -11          3  15:00

CodePudding user response：

You can check diff with cumsum

df.update(df.groupby('Name')[['First Run','Second Run','Third Run']].apply(lambda x : x.diff().cumsum()).fillna(0))
Out[30]: 
   FirstRun  SecondRun  ThirdRun
0       0.0        0.0       0.0
1       7.0      -19.0      -6.0
2     -67.0      -19.0     -63.0
3     -69.0        9.0      18.0
4     -39.0       -5.0     -13.0
5       0.0        0.0       0.0
6      14.0       34.0      17.0
7     -14.0      -23.0     -22.0
8      61.0      -23.0      -1.0
9      56.0      -11.0       3.0

CodePudding user response：

Use GroupBy.transform for repeat first values, so subtract working well:

cols = ['First Run','Second Run','Third Run']
df[cols] = df[cols].sub(df.groupby('Name')[cols].transform('first'))
print (df)
    Name  First Run  Second Run  Third Run   Time
0  Jakub          0           0          0   9:00
1  Jakub          7         -19         -6   9:00
2  Jakub        -67         -19        -63  15:00
3  Jakub        -69           9         18  15:00
4  Jakub        -39          -5        -13  15:00
5   Bene          0           0          0   9:00
6   Bene         14          34         17   9:00
7   Bene        -14         -23        -22  15:00
8   Bene         61         -23         -1  15:00
9   Bene         56         -11          3  15:00