Home > OS >  Get TimeDelta between rows, where rows not zero
Get TimeDelta between rows, where rows not zero

Time:11-17

I have a DataFrame

import pandas as pd

dates = pd.date_range(start='05.01.2021 00:00:00', end='05.01.2021 00:00:20', freq='S')

values = [1000,343,122.34,342.6,76.45,202,264.32,9454.3,1000,1000,0,0,0,0,0,0,232,2323,5562,3545, 123]

df = pd.DataFrame([dates, values], index=['dates', 'values']).T

I want to get the TimeDelta between the rows where values is not 0. For every individual range.

So here 9 seconds and 4 seconds. The TimeDelta should appear in a new column at the end of the corresponding timerange.

Any hints on how to do that?

Thx

EDIT:

The wanted DataFrame should look like

dates = pd.date_range(start='05.01.2021 00:00:00', end='05.01.2021 00:00:20', freq='S')
values = [1000,343,122.34,342.6,76.45,202,264.32,9454.3,1000,1000,0,0,0,0,0,0,232,2323,5562,3545,123]
delta = [None,None,None,None,None,None,None,None,None,'0 days 00:00:09',None,None,None,None,None,None,None,None,None,None,'0 days 00:00:04']
df = pd.DataFrame([dates, values, delta], index=['dates', 'values', 'timedelta']).T

Where the values in column timedelta are of dtype timedelta64[ns].

CodePudding user response:

dates = pd.date_range(start='05.01.2021 00:00:00', end='05.01.2021 00:00:20', freq='S')
values = [1000,343,122.34,342.6,76.45,202,264.32,9454.3,1000,1000,0,0,0,0,0,0,232,2323,5562,3545, 123]
df = pd.DataFrame([dates, values], index=['dates', 'values']).T

df['timedelta'] = df.groupby(
    (df['values'] == 0).diff().cumsum().fillna(0)
)['dates'].transform(lambda x: x.iloc[-1] - x.iloc[0])

Output:

    dates   values  timedelta
0   2021-05-01 00:00:00 1000    0 days 00:00:09
1   2021-05-01 00:00:01 343 0 days 00:00:09
2   2021-05-01 00:00:02 122.34  0 days 00:00:09
3   2021-05-01 00:00:03 342.6   0 days 00:00:09
4   2021-05-01 00:00:04 76.45   0 days 00:00:09
5   2021-05-01 00:00:05 202 0 days 00:00:09
6   2021-05-01 00:00:06 264.32  0 days 00:00:09
7   2021-05-01 00:00:07 9454.3  0 days 00:00:09
8   2021-05-01 00:00:08 1000    0 days 00:00:09
9   2021-05-01 00:00:09 1000    0 days 00:00:09
10  2021-05-01 00:00:10 0   0 days 00:00:05
11  2021-05-01 00:00:11 0   0 days 00:00:05
12  2021-05-01 00:00:12 0   0 days 00:00:05
13  2021-05-01 00:00:13 0   0 days 00:00:05
14  2021-05-01 00:00:14 0   0 days 00:00:05
15  2021-05-01 00:00:15 0   0 days 00:00:05
16  2021-05-01 00:00:16 232 0 days 00:00:04
17  2021-05-01 00:00:17 2323    0 days 00:00:04
18  2021-05-01 00:00:18 5562    0 days 00:00:04
19  2021-05-01 00:00:19 3545    0 days 00:00:04
20  2021-05-01 00:00:20 123 0 days 00:00:04

Explanation:

The series (df['values'] == 0).diff().cumsum().fillna(0) used in the groupby is

0     0.0
1     0.0
2     0.0
3     0.0
4     0.0
5     0.0
6     0.0
7     0.0
8     0.0
9     0.0
10    1.0
11    1.0
12    1.0
13    1.0
14    1.0
15    1.0
16    2.0
17    2.0
18    2.0
19    2.0
20    2.0

and it identifies groups of consecutive rows where the column values is either always 0 or always not 0.

Note that in this way the column timedelta contains more than you want. If you want the output to be exactly as you described you can add

df.loc[~((df['values'] == 0).diff().shift(-1).fillna(True) & (df['values'] != 0)), 'timedelta'] = np.nan

Output:

    dates   values  timedelta
0   2021-05-01 00:00:00 1000    NaT
1   2021-05-01 00:00:01 343 NaT
2   2021-05-01 00:00:02 122.34  NaT
3   2021-05-01 00:00:03 342.6   NaT
4   2021-05-01 00:00:04 76.45   NaT
5   2021-05-01 00:00:05 202 NaT
6   2021-05-01 00:00:06 264.32  NaT
7   2021-05-01 00:00:07 9454.3  NaT
8   2021-05-01 00:00:08 1000    NaT
9   2021-05-01 00:00:09 1000    0 days 00:00:09
10  2021-05-01 00:00:10 0   NaT
11  2021-05-01 00:00:11 0   NaT
12  2021-05-01 00:00:12 0   NaT
13  2021-05-01 00:00:13 0   NaT
14  2021-05-01 00:00:14 0   NaT
15  2021-05-01 00:00:15 0   NaT
16  2021-05-01 00:00:16 232 NaT
17  2021-05-01 00:00:17 2323    NaT
18  2021-05-01 00:00:18 5562    NaT
19  2021-05-01 00:00:19 3545    NaT
20  2021-05-01 00:00:20 123 0 days 00:00:04
  • Related