i have a number of arrays with the following format:
ic2 2021-05-06 09:10:30.582455 3.556000789 0
ic2 2021-05-06 09:15:30.999870 3.556000789 0
ic2 2021-05-06 09:20:31.784262 3.556000789 0
ic2 2021-05-06 09:25:32.538884 3.556000789 0
ic2 2021-05-06 09:30:32.767391 3.556000789 0
ic2 2021-05-06 09:35:33.293743 3.556000789 0
ic2 2021-05-06 09:40:33.454079 3.556000789 0
ic2 2021-05-06 09:45:33.853965 3.556000789 0
ic2 2021-05-06 09:50:34.421639 3.556000789 0
ic2 2021-05-06 09:55:34.653818 3.556000789 0
ic2 2021-05-06 10:00:35.134874 3.556000789 0
how do i remove the rows where the datetime > 2021-05-06 09:20 and < 2021-05-06 09:40 ?
I have tried with np.delete:
raindata[10]=np.delete(raindata[10],[(raindata[10]>2021-11-07 16:42) & (raindata[10]< 2022-11-15 04:07)],axis=0)
and np.where:
raindata[10][:,1]=np.where(raindata[10][:,1]>= 2022-11-07 16:42 and <= 2022-11-15 04:07, Nan, raindata[10][:,1])
but always get the error:
SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers
Edit in response to comments:
for ease I have created the datetime objects with
xx=datetime.datetime(2021,5,6, hour=9, minute=20)
yy=datetime.datetime(2021,5,6, hour=9, minute=40)
and incorporated this into Ben Grossman's solution:
newraindata=np.delete(raindata[10],xx < raindata[10] < yy, axis=0)
which gives a type error '>' not supported between instances of 'str' and 'datetime.datetime'. I understand this as column 0 is a string, so I tried
newraindata=np.delete(raindata[10], (raindata[10][:,1] > xx) & (raindata[10][:,1] < yy), axis=0)
which just replicates raindata[10].
Ben also asked for code which replicates the array, I'm sorry but the array is not generated by code, it is data imported from a PGSQL table, which is why i copied the first 11 rows as an example.
CodePudding user response:
It turns out that this entire problem can be nicely solved using the pandas module.
import pandas as pd
s = '''ic2 2021-05-06 09:10:30.582455 3.556000789 0
ic2 2021-05-06 09:15:30.999870 3.556000789 0
ic2 2021-05-06 09:20:31.784262 3.556000789 0
ic2 2021-05-06 09:25:32.538884 3.556000789 0
ic2 2021-05-06 09:30:32.767391 3.556000789 0
ic2 2021-05-06 09:35:33.293743 3.556000789 0
ic2 2021-05-06 09:40:33.454079 3.556000789 0
ic2 2021-05-06 09:45:33.853965 3.556000789 0
ic2 2021-05-06 09:50:34.421639 3.556000789 0
ic2 2021-05-06 09:55:34.653818 3.556000789 0
ic2 2021-05-06 10:00:35.134874 3.556000789 0'''
data = [line.split() for line in s.splitlines()]
data = [[line[0],line[1] ' ' line[2],float(line[3]),float(line[4])] for line in data]
df = pd.DataFrame(data)
df.loc[:,1] = pd.to_datetime(df.loc[:,1])
df_bool = (df[1] > pd.to_datetime('2021-05-06 09:20')) & (df[1] < pd.to_datetime('2021-05-06 09:40'))
print(df[~df_bool])
The result:
0 1 2 3
0 ic2 2021-05-06 09:10:30.582455 3.556001 0.0
1 ic2 2021-05-06 09:15:30.999870 3.556001 0.0
6 ic2 2021-05-06 09:40:33.454079 3.556001 0.0
7 ic2 2021-05-06 09:45:33.853965 3.556001 0.0
8 ic2 2021-05-06 09:50:34.421639 3.556001 0.0
9 ic2 2021-05-06 09:55:34.653818 3.556001 0.0
10 ic2 2021-05-06 10:00:35.134874 3.556001 0.0
CodePudding user response:
This has been solved using numpy:-
set the date range:
xx=datetime(2021,11,7, hour=16, minute=42)
yy=datetime(2021,11,15, hour=4, minute=8)
create a mask:
mask2=(rainmm[:,1] > xx) & (rainmm[:,1] < yy)
create the new array:
rainmm=rainmm[~mask2].copy()
job done :)