I don't know why my code is not working when there are many, many examples of filtering a pandas dataframe datetime column by date, but I can't get them to work.
My dataframe has a datetime column that does NOT have an index. I have tried to use the to_datetime()
to make sure the Series is a datetime data type. That worked. But then I am trying to filter the 3 months out of about 9 months in my datetime column, but keep getting key errors, oddly enough. The code samples I try do not have that issue in their examples.
Here is what I have so far. As you can see some of the commented out code are lines I tried but they didn't work:
First attempt. This separates date from datetime, but then I can't filter those dates shown in the date_df
.
# Convert date_time column to the datetime data type, then pull only dates
date_df['date_time'] = pd.to_datetime(less_hot_df['date_time']).dt.date
date_df.head()
9294 2014-03-07
5221 2014-01-10
5079 2013-12-30
1682 2013-12-24
4994 2013-12-23
Another attempt.
# Convert to datetime type, then get all rows within some dates
date_df = pd.to_datetime(less_hot_df['date_time']).dt.date
#date_df.loc['2014-03-01':'2014-05-30']
date_df.head()
But this one returns a key error.
KeyError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2897 try:
-> 2898 return self._engine.get_loc(casted_key)
2899 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()
KeyError: 'date_time'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
3 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2898 return self._engine.get_loc(casted_key)
2899 except KeyError as err:
-> 2900 raise KeyError(key) from err
2901
2902 if tolerance is not None:
KeyError: 'date_time'
Yet another attempt: won't work because of type error "not supported between instances of 'str' and 'Timestamp'. But why they are considering my date_time
column to be a timestamp, when it was just converted to a datetime type is beyond me.
start_date = pd.to_datetime('3/1/2014 6:00', utc= True)
end_date = pd.to_datetime('5/30/2014 18:00', utc= True)
less_hot_df.loc[(less_hot_df['date_time'] > start_date) & (less_hot_df['date_time'] < end_date)]
less_hot_df.head()
TypeError Traceback (most recent call last)
<ipython-input-166-ae4cf597cacf> in <module>()
8 start_date = pd.to_datetime('3/1/2014 6:00', utc= True)
9 end_date = pd.to_datetime('5/30/2014 18:00', utc= True)
---> 10 less_hot_df.loc[(less_hot_df['date_time'] > start_date) & (less_hot_df['date_time'] < end_date)]
11
12 less_hot_df.head()
3 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in comp_method_OBJECT_ARRAY(op, x, y)
54 result = libops.vec_compare(x.ravel(), y.ravel(), op)
55 else:
---> 56 result = libops.scalar_compare(x.ravel(), y, op)
57 return result.reshape(x.shape)
58
pandas/_libs/ops.pyx in pandas._libs.ops.scalar_compare()
TypeError: '>' not supported between instances of 'str' and 'Timestamp'
CodePudding user response:
Question not clear but lets try
less_hot_df['date_time']=pd.to_datetime(less_hot_df['date_time'])
less_hot_df[less_hot_df['date_time'].between('2014-03-01','2014-05-30')]
CodePudding user response:
Finally found something that works:
less_hot_df = less_hot_df[less_hot_df['date_time'].dt.month.between(3,6)]
This returns only the rows between the months of March through May (the 6 is the stopping point, so it will stop at June). I guess I didn't have to convert my date_time
column to a datetime data type after all.
CodePudding user response:
Try adding this line before you do any sorting:
less_hot_df['date_time'] = pd.to_datetime(less_hot_df['date_time'])