I want to keep only the rows in which the time is between the July 4 and May 24 of the same year, so I'm using this code :
def fix_time(data):
12 data['timestamp'] = pd.to_datetime(data['timestamp'], format="%d-%m-%Y %H:%M:%S")
---> 13 indexNames = data[ (data['timestamp'] < '24-05-2021 00:00:00') & (data['timestamp'] > '05-07-2021 00:00:00') ].index
14 data.drop(indexNames , inplace=True)
15 return data
But it doesn't work as I wanted: when I use data['timestamp'].max()
I get 2021-09-30
and that's not be correct.
CodePudding user response:
between
works better for this:
def fix_time(data):
data['timestamp'] = pd.to_datetime(data['timestamp'], format="%d-%m-%Y %H:%M:%S")
return data[data['timestamp'].between('2021-05-07', '2021-05-24')]
Also, note that you must use the ISO format of dates when comparing dates in pandas, i.e., you have to write 2021-05-24 00:00:00
(yyyy-mm-dd) instead of 24-05-2021 00:00:00
(dd-mm-yyyy).