Home > database >  For unique names, get the first value and subract from the rest value pandas dataframe
For unique names, get the first value and subract from the rest value pandas dataframe

Time:03-14

I have a pandas Dataframe with a Name column, for each Name, I would like to calculate the difference between the first value and each of the other values.

DataFrame

        Name   First Run    Second Run  Third Run   Time
0      Jakub    315            279        276       9:00
1      Jakub    322            260        270       9:00
2      Jakub    248            260        213       15:00
3      Jakub    246            288        294       15:00
4      Jakub    276            274        263       15:00
5      Bene     222            268        246       9:00
6      Bene     236            302        263       9:00
7      Bene     208            245        224       15:00
8      Bene     283            245        245       15:00
9      Bene     278            257        249       15:00

first_values = df.groupby(['Name']).first()[['First Run','Second Run','Third Run']]

         First Run        Second Run    Third Run   Time
Name            
Jakub       315             279          276        9:00
Bene        222             268          246        9:00

Output_df

        Name   First Run    Second Run  Third Run   Time
0      Jakub    0              0           0        9:00
1      Jakub    7              -19        -6        9:00
2      Jakub    -67            -19        -63       15:00
3      Jakub    -69            9          18        15:00
4      Jakub    -39            -5         -13       15:00
5      Bene     0              0          0         9:00
6      Bene     14             34         17        9:00
7      Bene     -14            -23        22        15:00
8      Bene     61             -23        -1        15:00
9      Bene     56             -9         3         15:00

I can figure out the right function to get the desired output. I tried the below code but I am getting a TypeError.

df.Name.map(first_values.index).sub(first_values)

CodePudding user response:

You can use groupby apply and subtract the first row per group:

df.filter(like=' Run').groupby(df['Name']).apply(lambda g: g-g.iloc[0])

or from a manual list of columns:

cols = ['First Run','Second Run','Third Run']
df[cols].groupby(df['Name']).apply(lambda g: g-g.iloc[0])

full code to have a new dataframe:

df_output = df.copy()
df_output.update(df.filter(like=' Run')
                   .groupby(df['Name'])
                   .apply(lambda g: g-g.iloc[0])
                 )

output:

    Name  First Run  Second Run  Third Run   Time
0  Jakub          0           0          0   9:00
1  Jakub          7         -19         -6   9:00
2  Jakub        -67         -19        -63  15:00
3  Jakub        -69           9         18  15:00
4  Jakub        -39          -5        -13  15:00
5   Bene          0           0          0   9:00
6   Bene         14          34         17   9:00
7   Bene        -14         -23        -22  15:00
8   Bene         61         -23         -1  15:00
9   Bene         56         -11          3  15:00

CodePudding user response:

You can check diff with cumsum

df.update(df.groupby('Name')[['First Run','Second Run','Third Run']].apply(lambda x : x.diff().cumsum()).fillna(0))
Out[30]: 
   FirstRun  SecondRun  ThirdRun
0       0.0        0.0       0.0
1       7.0      -19.0      -6.0
2     -67.0      -19.0     -63.0
3     -69.0        9.0      18.0
4     -39.0       -5.0     -13.0
5       0.0        0.0       0.0
6      14.0       34.0      17.0
7     -14.0      -23.0     -22.0
8      61.0      -23.0      -1.0
9      56.0      -11.0       3.0

CodePudding user response:

Use GroupBy.transform for repeat first values, so subtract working well:

cols = ['First Run','Second Run','Third Run']
df[cols] = df[cols].sub(df.groupby('Name')[cols].transform('first'))
print (df)
    Name  First Run  Second Run  Third Run   Time
0  Jakub          0           0          0   9:00
1  Jakub          7         -19         -6   9:00
2  Jakub        -67         -19        -63  15:00
3  Jakub        -69           9         18  15:00
4  Jakub        -39          -5        -13  15:00
5   Bene          0           0          0   9:00
6   Bene         14          34         17   9:00
7   Bene        -14         -23        -22  15:00
8   Bene         61         -23         -1  15:00
9   Bene         56         -11          3  15:00
  • Related