Home > Software engineering >  Using a two column dataframe to have a time counter, but reset on a certain condition
Using a two column dataframe to have a time counter, but reset on a certain condition

Time:07-15

I currently have two columns in a dataframe, one called Total Time, and one called cycle. Total time is the time between each instance in the dataframe occuring, and cycle indicates what cycle that the time belongs to. I want to create a time column, Cycle Time, that shows the acccumulation of total time during each cycle. I have code that almost works, but with one exception - it adds the time on between each cycle, which I don't want (when the cycle changes, I want the counter to completely reset). Here is my current code, to better understand what I'm aiming to achieve:


import pandas as pd

df = pd.DataFrame({"Cycle": [1,1,1,2,2,2,2,3,3,3,4,4,4,5,5,5,5],
                   "Total Time": [0,0.2,0.2,0.4,0.4,0.7,0.7,1.0,1.0,1.2,1.3,1.3,1.5,1.6,1.6,1.8,1.8]})


df['Cycle Time'] = df['Total Time'].diff().fillna(0).groupby(df['Cycle']).cumsum()


print(df['Cycle Time'])

0     0.0
1     0.2
2     0.2
3     0.2
4     0.2
5     0.5
6     0.5
7     0.3
8     0.3
9     0.5
10    0.1
11    0.1
12    0.3
13    0.1
14    0.1
15    0.3
16    0.3

As there is time between each new cycle, the cycle resets so there is no time difference between the first two instances of the new cycle (except in the first cycle). This also occurs at certain stages in the total time, where the time remains the same. Ideally, my output would look like this:


0     0.0
1     0.2
2     0.2
3     0.0
4     0.0
5     0.3
6     0.3
7     0.0
8     0.0
9     0.2
10    0.0
11    0.0
12    0.2
13    0.0
14    0.0
15    0.2
16    0.2

Basically, I'd like to create a counter that adds up all the time of each cycle, but resets to zero at the first instance of the new cycle in the dataframe.

CodePudding user response:

What you describe is:

df['Cycle Time'] = (df.groupby('Cycle')['Total Time']
                      .apply(lambda s: s.diff().fillna(0).cumsum())
                    )

But this is not so efficient, as you get the diff to then take the cumsum.

What you want is equivalent to just subtracting the initial value per group:

df['Cycle Time'] = df['Total Time'].sub(
                    df.groupby('Cycle')['Total Time'].transform('first')
                    )

output:

    Cycle  Total Time  Cycle Time
0       1         0.0         0.0
1       1         0.2         0.2
2       1         0.3         0.3
3       2         0.4         0.0
4       2         0.4         0.0
5       2         0.7         0.3
6       2         0.9         0.5
7       3         1.0         0.0
8       3         1.0         0.0
9       3         1.2         0.2
10      4         1.3         0.0
11      4         1.3         0.0
12      4         1.5         0.2
13      5         1.6         0.0
14      5         1.6         0.0
15      5         1.8         0.2
16      5         2.1         0.5
  • Related