how do i remove rows from numpy array based on date?-CodePudding

i have a number of arrays with the following format:

ic2 2021-05-06 09:10:30.582455  3.556000789 0
ic2 2021-05-06 09:15:30.999870  3.556000789 0
ic2 2021-05-06 09:20:31.784262  3.556000789 0
ic2 2021-05-06 09:25:32.538884  3.556000789 0
ic2 2021-05-06 09:30:32.767391  3.556000789 0
ic2 2021-05-06 09:35:33.293743  3.556000789 0
ic2 2021-05-06 09:40:33.454079  3.556000789 0
ic2 2021-05-06 09:45:33.853965  3.556000789 0
ic2 2021-05-06 09:50:34.421639  3.556000789 0
ic2 2021-05-06 09:55:34.653818  3.556000789 0
ic2 2021-05-06 10:00:35.134874  3.556000789 0

how do i remove the rows where the datetime > 2021-05-06 09:20 and < 2021-05-06 09:40 ?

I have tried with np.delete:

 raindata[10]=np.delete(raindata[10],[(raindata[10]>2021-11-07 16:42) & (raindata[10]< 2022-11-15 04:07)],axis=0)

and np.where:

raindata[10][:,1]=np.where(raindata[10][:,1]>= 2022-11-07 16:42 and <= 2022-11-15 04:07, Nan, raindata[10][:,1])

but always get the error:

SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers

Edit in response to comments:

for ease I have created the datetime objects with

xx=datetime.datetime(2021,5,6, hour=9, minute=20)
yy=datetime.datetime(2021,5,6, hour=9, minute=40)

and incorporated this into Ben Grossman's solution:

newraindata=np.delete(raindata[10],xx < raindata[10] < yy, axis=0)

which gives a type error '>' not supported between instances of 'str' and 'datetime.datetime'. I understand this as column 0 is a string, so I tried

newraindata=np.delete(raindata[10], (raindata[10][:,1] > xx) & (raindata[10][:,1] < yy), axis=0)

which just replicates raindata[10].

Ben also asked for code which replicates the array, I'm sorry but the array is not generated by code, it is data imported from a PGSQL table, which is why i copied the first 11 rows as an example.

CodePudding user response：

It turns out that this entire problem can be nicely solved using the pandas module.

import pandas as pd

s = '''ic2 2021-05-06 09:10:30.582455  3.556000789 0
ic2 2021-05-06 09:15:30.999870  3.556000789 0
ic2 2021-05-06 09:20:31.784262  3.556000789 0
ic2 2021-05-06 09:25:32.538884  3.556000789 0
ic2 2021-05-06 09:30:32.767391  3.556000789 0
ic2 2021-05-06 09:35:33.293743  3.556000789 0
ic2 2021-05-06 09:40:33.454079  3.556000789 0
ic2 2021-05-06 09:45:33.853965  3.556000789 0
ic2 2021-05-06 09:50:34.421639  3.556000789 0
ic2 2021-05-06 09:55:34.653818  3.556000789 0
ic2 2021-05-06 10:00:35.134874  3.556000789 0'''

data = [line.split() for line in s.splitlines()]
data = [[line[0],line[1] ' ' line[2],float(line[3]),float(line[4])] for line in data]

df = pd.DataFrame(data)
df.loc[:,1] = pd.to_datetime(df.loc[:,1])
df_bool = (df[1] > pd.to_datetime('2021-05-06 09:20')) & (df[1] < pd.to_datetime('2021-05-06 09:40'))
print(df[~df_bool])

The result:

      0                          1         2    3
0   ic2 2021-05-06 09:10:30.582455  3.556001  0.0
1   ic2 2021-05-06 09:15:30.999870  3.556001  0.0
6   ic2 2021-05-06 09:40:33.454079  3.556001  0.0
7   ic2 2021-05-06 09:45:33.853965  3.556001  0.0
8   ic2 2021-05-06 09:50:34.421639  3.556001  0.0
9   ic2 2021-05-06 09:55:34.653818  3.556001  0.0
10  ic2 2021-05-06 10:00:35.134874  3.556001  0.0

CodePudding user response：

This has been solved using numpy:-

set the date range:

xx=datetime(2021,11,7, hour=16, minute=42)
yy=datetime(2021,11,15, hour=4, minute=8)

create a mask:

mask2=(rainmm[:,1] > xx) & (rainmm[:,1] < yy)

create the new array:

rainmm=rainmm[~mask2].copy()

job done :)