I have a DataFrame
import pandas as pd
dates = pd.date_range(start='05.01.2021 00:00:00', end='05.01.2021 00:00:20', freq='S')
values = [1000,343,122.34,342.6,76.45,202,264.32,9454.3,1000,1000,0,0,0,0,0,0,232,2323,5562,3545, 123]
df = pd.DataFrame([dates, values], index=['dates', 'values']).T
I want to get the TimeDelta between the rows where values
is not 0. For every individual range.
So here 9 seconds and 4 seconds. The TimeDelta should appear in a new column at the end of the corresponding timerange.
Any hints on how to do that?
Thx
EDIT:
The wanted DataFrame should look like
dates = pd.date_range(start='05.01.2021 00:00:00', end='05.01.2021 00:00:20', freq='S')
values = [1000,343,122.34,342.6,76.45,202,264.32,9454.3,1000,1000,0,0,0,0,0,0,232,2323,5562,3545,123]
delta = [None,None,None,None,None,None,None,None,None,'0 days 00:00:09',None,None,None,None,None,None,None,None,None,None,'0 days 00:00:04']
df = pd.DataFrame([dates, values, delta], index=['dates', 'values', 'timedelta']).T
Where the values in column timedelta
are of dtype timedelta64[ns]
.
CodePudding user response:
dates = pd.date_range(start='05.01.2021 00:00:00', end='05.01.2021 00:00:20', freq='S')
values = [1000,343,122.34,342.6,76.45,202,264.32,9454.3,1000,1000,0,0,0,0,0,0,232,2323,5562,3545, 123]
df = pd.DataFrame([dates, values], index=['dates', 'values']).T
df['timedelta'] = df.groupby(
(df['values'] == 0).diff().cumsum().fillna(0)
)['dates'].transform(lambda x: x.iloc[-1] - x.iloc[0])
Output:
dates values timedelta
0 2021-05-01 00:00:00 1000 0 days 00:00:09
1 2021-05-01 00:00:01 343 0 days 00:00:09
2 2021-05-01 00:00:02 122.34 0 days 00:00:09
3 2021-05-01 00:00:03 342.6 0 days 00:00:09
4 2021-05-01 00:00:04 76.45 0 days 00:00:09
5 2021-05-01 00:00:05 202 0 days 00:00:09
6 2021-05-01 00:00:06 264.32 0 days 00:00:09
7 2021-05-01 00:00:07 9454.3 0 days 00:00:09
8 2021-05-01 00:00:08 1000 0 days 00:00:09
9 2021-05-01 00:00:09 1000 0 days 00:00:09
10 2021-05-01 00:00:10 0 0 days 00:00:05
11 2021-05-01 00:00:11 0 0 days 00:00:05
12 2021-05-01 00:00:12 0 0 days 00:00:05
13 2021-05-01 00:00:13 0 0 days 00:00:05
14 2021-05-01 00:00:14 0 0 days 00:00:05
15 2021-05-01 00:00:15 0 0 days 00:00:05
16 2021-05-01 00:00:16 232 0 days 00:00:04
17 2021-05-01 00:00:17 2323 0 days 00:00:04
18 2021-05-01 00:00:18 5562 0 days 00:00:04
19 2021-05-01 00:00:19 3545 0 days 00:00:04
20 2021-05-01 00:00:20 123 0 days 00:00:04
Explanation:
The series (df['values'] == 0).diff().cumsum().fillna(0)
used in the groupby is
0 0.0
1 0.0
2 0.0
3 0.0
4 0.0
5 0.0
6 0.0
7 0.0
8 0.0
9 0.0
10 1.0
11 1.0
12 1.0
13 1.0
14 1.0
15 1.0
16 2.0
17 2.0
18 2.0
19 2.0
20 2.0
and it identifies groups of consecutive rows where the column values
is either always 0 or always not 0.
Note that in this way the column timedelta
contains more than you want. If you want the output to be exactly as you described you can add
df.loc[~((df['values'] == 0).diff().shift(-1).fillna(True) & (df['values'] != 0)), 'timedelta'] = np.nan
Output:
dates values timedelta
0 2021-05-01 00:00:00 1000 NaT
1 2021-05-01 00:00:01 343 NaT
2 2021-05-01 00:00:02 122.34 NaT
3 2021-05-01 00:00:03 342.6 NaT
4 2021-05-01 00:00:04 76.45 NaT
5 2021-05-01 00:00:05 202 NaT
6 2021-05-01 00:00:06 264.32 NaT
7 2021-05-01 00:00:07 9454.3 NaT
8 2021-05-01 00:00:08 1000 NaT
9 2021-05-01 00:00:09 1000 0 days 00:00:09
10 2021-05-01 00:00:10 0 NaT
11 2021-05-01 00:00:11 0 NaT
12 2021-05-01 00:00:12 0 NaT
13 2021-05-01 00:00:13 0 NaT
14 2021-05-01 00:00:14 0 NaT
15 2021-05-01 00:00:15 0 NaT
16 2021-05-01 00:00:16 232 NaT
17 2021-05-01 00:00:17 2323 NaT
18 2021-05-01 00:00:18 5562 NaT
19 2021-05-01 00:00:19 3545 NaT
20 2021-05-01 00:00:20 123 0 days 00:00:04