Home > Software engineering >  How to drop rows based on datetime (every 15 min)?
How to drop rows based on datetime (every 15 min)?

Time:08-24

This is an example of a larger dataframe.

df = pd.DataFrame({"DateTime":["2020-04-02 06:06:22","2020-04-02 06:12:22","2020-04-02 06:35:34","2020-04-02 07:06:09",
                               "2020-04-02 07:10:34","2020-04-02 07:41:44"],
                   "Data":[23, 31, 10, 23, 56, 81]})

# column DateTime type must be datetime64[ns]
df["DateTime"] = df["DateTime"].astype("datetime64[ns]")

df
Out[8]: 
             DateTime  Data
0 2020-04-02 06:06:22    23
1 2020-04-02 06:12:22    31
2 2020-04-02 06:35:34    10
3 2020-04-02 07:06:09    23
4 2020-04-02 07:10:34    56
5 2020-04-02 07:41:44    81

I would like to drop a row if the date in df["DateTime"] in the row below occurs faster than 15 minutes when compared to the actual row. And the result should be like this:

              DateTime  Data
0 2020-04-02 06:06:22    23
2 2020-04-02 06:35:34    10
3 2020-04-02 07:06:09    23
5 2020-04-02 07:41:44    81

Does anyone know any condition I could use to set these intervals (every 15 min) and drop these rows?

Thanks

CodePudding user response:

Use boolean indexing with diff to generate a Timedelta:

m = df['DateTime'].diff().lt('15min')

out = df[~m]

Output:

             DateTime  Data
0 2020-04-02 06:06:22    23
2 2020-04-02 06:35:34    10
3 2020-04-02 07:06:09    23
5 2020-04-02 07:41:44    81

Intermediate ~m:

0     True
1    False
2     True
3     True
4    False
5     True
Name: DateTime, dtype: bool

CodePudding user response:

Create a Timedelta column:

df['diff'] = df['DateTime'] - df['DateTime'].shift(1)

Drop if the Timedelta column is less than 15 minutes.

>>> df[df['diff'].fillna(pd.Timedelta.max) > '15 minutes']
             DateTime  Data            diff
0 2020-04-02 06:06:22    23             NaT
2 2020-04-02 06:35:34    10 0 days 00:23:12
3 2020-04-02 07:06:09    23 0 days 00:30:35
5 2020-04-02 07:41:44    81 0 days 00:31:10
  • Related