This is an example of a larger dataframe.
df = pd.DataFrame({"DateTime":["2020-04-02 06:06:22","2020-04-02 06:12:22","2020-04-02 06:35:34","2020-04-02 07:06:09",
"2020-04-02 07:10:34","2020-04-02 07:41:44"],
"Data":[23, 31, 10, 23, 56, 81]})
# column DateTime type must be datetime64[ns]
df["DateTime"] = df["DateTime"].astype("datetime64[ns]")
df
Out[8]:
DateTime Data
0 2020-04-02 06:06:22 23
1 2020-04-02 06:12:22 31
2 2020-04-02 06:35:34 10
3 2020-04-02 07:06:09 23
4 2020-04-02 07:10:34 56
5 2020-04-02 07:41:44 81
I would like to drop a row if the date in df["DateTime"] in the row below occurs faster than 15 minutes when compared to the actual row. And the result should be like this:
DateTime Data
0 2020-04-02 06:06:22 23
2 2020-04-02 06:35:34 10
3 2020-04-02 07:06:09 23
5 2020-04-02 07:41:44 81
Does anyone know any condition I could use to set these intervals (every 15 min) and drop these rows?
Thanks
CodePudding user response:
Use boolean indexing with diff
to generate a Timedelta
:
m = df['DateTime'].diff().lt('15min')
out = df[~m]
Output:
DateTime Data
0 2020-04-02 06:06:22 23
2 2020-04-02 06:35:34 10
3 2020-04-02 07:06:09 23
5 2020-04-02 07:41:44 81
Intermediate ~m
:
0 True
1 False
2 True
3 True
4 False
5 True
Name: DateTime, dtype: bool
CodePudding user response:
Create a Timedelta
column:
df['diff'] = df['DateTime'] - df['DateTime'].shift(1)
Drop if the Timedelta
column is less than 15 minutes.
>>> df[df['diff'].fillna(pd.Timedelta.max) > '15 minutes']
DateTime Data diff
0 2020-04-02 06:06:22 23 NaT
2 2020-04-02 06:35:34 10 0 days 00:23:12
3 2020-04-02 07:06:09 23 0 days 00:30:35
5 2020-04-02 07:41:44 81 0 days 00:31:10