Remove row if it is within a minute of the next python-CodePudding

I have this data frame

data = {'Date': [np.datetime64('2005-02-25 01:30:10'), np.datetime64('2005-02-25 01:31:10'),np.datetime64('2005-02-25 02:36:10'),
                 np.datetime64('2005-02-25 02:45:10'), np.datetime64('2005-02-25 02:45:50'),np.datetime64('2005-02-25 03:54:20'),
                 np.datetime64('2005-02-25 03:55:10'),np.datetime64('2005-02-25 05:30:10'), np.datetime64('2005-02-25 06:30:10'),
                 np.datetime64('2005-02-25 06:30:30')],
        'Value':[1,4,6,7,3,6,7,8,3,2]}

df = pd.DataFrame(data)

    Date                Value
0   2005-02-25 01:30:10 1
1   2005-02-25 01:31:10 4
2   2005-02-25 02:36:10 6
3   2005-02-25 02:45:10 7
4   2005-02-25 02:45:50 3
5   2005-02-25 03:54:20 6
6   2005-02-25 03:55:10 7
7   2005-02-25 05:30:10 8
8   2005-02-25 06:30:10 3
9   2005-02-25 06:30:30 2

How would I go about removing the first row when the next row is within one minute of it, without doing it manually.

So my expected output would be:

    Date                Value
1   2005-02-25 01:31:10 4
2   2005-02-25 02:36:10 6
4   2005-02-25 02:45:50 3
6   2005-02-25 03:55:10 7
7   2005-02-25 05:30:10 8
9   2005-02-25 06:30:30 2

Please let me know how to do this

CodePudding user response：

Use Series.shift wth subtract, then Series.dt.total_seconds, divide by DataFrame.floordiv and last filter if greater like 1 or missing values (for match last value) in boolean indexing:

s = df['Date'].shift(-1).sub(df['Date']).dt.total_seconds().floordiv(60)

df = df[s.isna() | s.gt(1)]
print (df)
                 Date  Value
1 2005-02-25 01:31:10      4
2 2005-02-25 02:36:10      6
4 2005-02-25 02:45:50      3
6 2005-02-25 03:55:10      7
7 2005-02-25 05:30:10      8
9 2005-02-25 06:30:30      2

CodePudding user response：

Check this solution, hope it helps

df['flag'] = df[['Date']].apply(lambda x:[False if pd.Timedelta(x[i 1]-x[i]).total_seconds()/60 <=1 else True for i in range(0,len(x)-1)])
df = df[df['flag']!=False].drop('flag',axis=1)