I have two user defined variables that are created as strings in the format "%m%d%y" I need to convert these values to a datetime to filter out records in a pandas dataframe where the dataframe drops the date values prior to a start date and after an end date string value entered by the user, for example:
start_date = r'12/12/1984'
end_date = r'12/12/1986'
date1 = time.mktime(datetime.datetime.strptime(start_date, "%m/%d/%Y").timetuple())
date2 = time.mktime(datetime.datetime.strptime(end_date, "%m/%d/%Y").timetuple())
df = df[df['CreatedDate'] > date1 & df['CreatedDate'] <= date2 ]
This is the error I'm getting:
TypeError Traceback (most recent call last)
In [102]:
Line 4: df = df[df['CreatedDate'] > date1 & df['CreatedDate'] <= date2 ]
File C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\ops\common.py, in new_method:
Line 65: return method(self, other)
File C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\arraylike.py, in __rand__:
Line 63: return self._logical_method(other, roperator.rand_)
File C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\series.py, in _logical_method:
Line 4989: res_values = ops.logical_op(lvalues, rvalues, op)
File C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\ops\array_ops.py, in logical_op:
Line 340: res_values = op(lvalues, rvalues)
File C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\ops\roperator.py, in rand_:
Line 52: return operator.and_(right, left)
TypeError: unsupported operand type(s) for &: 'float' and 'DatetimeArray'
This isn't too unexpected as the conversion of the variables is 471686400.0
Is there an argument I can use so the variables can be used with a column having this datatype:
df.CreatedDate[0]
Timestamp('1968-07-02 07:00:00')
CodePudding user response:
The first issue has nothing to do with your datatypes, you've just been got by Python's order of operations.
# This:
df = df[df['CreatedDate'] > date1 & df['CreatedDate'] <= date2]
# Is NOT the same as:
df = df[(df['CreatedDate'] > date1) & (df['CreatedDate'] <= date2)]
# Which is why I suggest using:
df = df[df['CreatedDate'].gt(date1) & df['CreatedDate'].le(date2)]
# Because then you don't have to worry about () placement.
The second issue IS a datatype issue, you should just make all your time formats into pandas datetime format.
df['CreatedDate'] = pd.to_datetime(df['CreatedDate'])
Then your comparisons could be as simple as:
start_date = '1984-12-12'
end_date = '1986-12-12'
df = df[df['CreatedDate'].between(start_date, end_date)]