I have this data frame
data = {'Date': [np.datetime64('2005-02-25 01:30:10'), np.datetime64('2005-02-25 01:31:10'),np.datetime64('2005-02-25 02:36:10'),
np.datetime64('2005-02-25 02:45:10'), np.datetime64('2005-02-25 02:45:50'),np.datetime64('2005-02-25 03:54:20'),
np.datetime64('2005-02-25 03:55:10'),np.datetime64('2005-02-25 05:30:10'), np.datetime64('2005-02-25 06:30:10'),
np.datetime64('2005-02-25 06:30:30')],
'Value':[1,4,6,7,3,6,7,8,3,2]}
df = pd.DataFrame(data)
Date Value
0 2005-02-25 01:30:10 1
1 2005-02-25 01:31:10 4
2 2005-02-25 02:36:10 6
3 2005-02-25 02:45:10 7
4 2005-02-25 02:45:50 3
5 2005-02-25 03:54:20 6
6 2005-02-25 03:55:10 7
7 2005-02-25 05:30:10 8
8 2005-02-25 06:30:10 3
9 2005-02-25 06:30:30 2
How would I go about removing the first row when the next row is within one minute of it, without doing it manually.
So my expected output would be:
Date Value
1 2005-02-25 01:31:10 4
2 2005-02-25 02:36:10 6
4 2005-02-25 02:45:50 3
6 2005-02-25 03:55:10 7
7 2005-02-25 05:30:10 8
9 2005-02-25 06:30:30 2
Please let me know how to do this
CodePudding user response:
Use Series.shift
wth subtract, then Series.dt.total_seconds
, divide by DataFrame.floordiv
and last filter if greater like 1
or missing values (for match last value) in boolean indexing
:
s = df['Date'].shift(-1).sub(df['Date']).dt.total_seconds().floordiv(60)
df = df[s.isna() | s.gt(1)]
print (df)
Date Value
1 2005-02-25 01:31:10 4
2 2005-02-25 02:36:10 6
4 2005-02-25 02:45:50 3
6 2005-02-25 03:55:10 7
7 2005-02-25 05:30:10 8
9 2005-02-25 06:30:30 2
CodePudding user response:
Check this solution, hope it helps
df['flag'] = df[['Date']].apply(lambda x:[False if pd.Timedelta(x[i 1]-x[i]).total_seconds()/60 <=1 else True for i in range(0,len(x)-1)])
df = df[df['flag']!=False].drop('flag',axis=1)