Home > Software engineering >  Filter datetime by date, pandas python
Filter datetime by date, pandas python

Time:11-12

I don't know why my code is not working when there are many, many examples of filtering a pandas dataframe datetime column by date, but I can't get them to work.

My dataframe has a datetime column that does NOT have an index. I have tried to use the to_datetime() to make sure the Series is a datetime data type. That worked. But then I am trying to filter the 3 months out of about 9 months in my datetime column, but keep getting key errors, oddly enough. The code samples I try do not have that issue in their examples.

Here is what I have so far. As you can see some of the commented out code are lines I tried but they didn't work:

First attempt. This separates date from datetime, but then I can't filter those dates shown in the date_df.

# Convert date_time column to the datetime data type, then pull only dates
date_df['date_time'] = pd.to_datetime(less_hot_df['date_time']).dt.date
date_df.head()

9294    2014-03-07
5221    2014-01-10
5079    2013-12-30
1682    2013-12-24
4994    2013-12-23

Another attempt.

# Convert to datetime type, then get all rows within some dates
date_df = pd.to_datetime(less_hot_df['date_time']).dt.date
#date_df.loc['2014-03-01':'2014-05-30']
date_df.head()

But this one returns a key error.

KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2897             try:
-> 2898                 return self._engine.get_loc(casted_key)
   2899             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()

KeyError: 'date_time'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
3 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2898                 return self._engine.get_loc(casted_key)
   2899             except KeyError as err:
-> 2900                 raise KeyError(key) from err
   2901 
   2902         if tolerance is not None:

KeyError: 'date_time'

Yet another attempt: won't work because of type error "not supported between instances of 'str' and 'Timestamp'. But why they are considering my date_time column to be a timestamp, when it was just converted to a datetime type is beyond me.

start_date = pd.to_datetime('3/1/2014 6:00', utc= True)
end_date = pd.to_datetime('5/30/2014 18:00', utc= True)
less_hot_df.loc[(less_hot_df['date_time'] > start_date) & (less_hot_df['date_time'] < end_date)]

less_hot_df.head()

TypeError                                 Traceback (most recent call last)
<ipython-input-166-ae4cf597cacf> in <module>()
      8 start_date = pd.to_datetime('3/1/2014 6:00', utc= True)
      9 end_date = pd.to_datetime('5/30/2014 18:00', utc= True)
---> 10 less_hot_df.loc[(less_hot_df['date_time'] > start_date) & (less_hot_df['date_time'] < end_date)]
     11 
     12 less_hot_df.head()

3 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in comp_method_OBJECT_ARRAY(op, x, y)
     54         result = libops.vec_compare(x.ravel(), y.ravel(), op)
     55     else:
---> 56         result = libops.scalar_compare(x.ravel(), y, op)
     57     return result.reshape(x.shape)
     58 

pandas/_libs/ops.pyx in pandas._libs.ops.scalar_compare()

TypeError: '>' not supported between instances of 'str' and 'Timestamp'

CodePudding user response:

Question not clear but lets try

less_hot_df['date_time']=pd.to_datetime(less_hot_df['date_time'])
less_hot_df[less_hot_df['date_time'].between('2014-03-01','2014-05-30')]

CodePudding user response:

Finally found something that works:

less_hot_df = less_hot_df[less_hot_df['date_time'].dt.month.between(3,6)]

This returns only the rows between the months of March through May (the 6 is the stopping point, so it will stop at June). I guess I didn't have to convert my date_time column to a datetime data type after all.

CodePudding user response:

Try adding this line before you do any sorting:

less_hot_df['date_time'] = pd.to_datetime(less_hot_df['date_time'])
  • Related