Pandas compute time delta on filtered datasets-CodePudding

If I make up some time series data:

import pandas as pd
import numpy as np
from numpy.random import seed

# seed random number generator
seed(1)

time = pd.date_range('6/28/2021', periods=100, freq='1min')
df = pd.DataFrame(np.random.randint(100, size=100), index=time,columns=['data'])

df.plot(figsize=(25,8))

This will plot:

And then filter the data for when data is above 50: df = df.loc[df['data'] > 50]

How do I compute the time delta for when the data is above the value of 50? ONLY above the value of 50. For example if I do this:

# Compute delta time 
df['time_delta'] = df.index.to_series().diff().astype('timedelta64[m]')

df.time_delta.sum()

I dont think the sum is correct as this will include a time delta for when the data was below the value of 50, hope fully that makes sense to ONLY retrieve a time delta for when the value was above 50.

CodePudding user response：

IIUC, you want:

df["timedelta"] = df.index.to_series().diff().where(df["data"].gt(50))

>>> df["timedelta"].sum()
Timedelta('0 days 00:44:00')

Which should be correct because there are exactly 44 rows where "data" is above 50 and each of these corresponds to a 1 minute time difference:

>>> df["data"].gt(50).sum()
44

CodePudding user response：

You can mask the data after computing the diff:

df.index.to_series().diff().mask(df['data'].le(50)).sum()

output: Timedelta('0 days 00:44:00')