I am reading a csv file and would like to filter rows based on a column that is in date format (i..e, yyyy/mm/dd). However, I am receiving an error '<' not supported between instances of 'int' and 'datetime.date'
. It seems like the conversion that I apply does not go through.
import pandas as pd
import datetime
#Suppose I read the following from an Excel file.
x = pd.DataFrame([
"4/11/2022",
"6/16/2022",
"12/18/2021",
"11/13/2020",
"5/17/2023"
], columns = ["My Date"])
#I first convert the values into date format
x["My Date"] = pd.to_datetime(x["My Date"], errors = 'coerce').dt.date
#Then, I go ahead and try to extract values between two dates.
startdate = pd.to_datetime("2021-1-1").date()
enddate = pd.to_datetime("2022-1-1").date()
x["My Date"].loc[startdate:enddate]
My guess is that the conversion that I do with x["My Date"] = pd.to_datetime(x["My Date"], errors = 'coerce').dt.date
is not working properly and was wondering if someone can help me with that.
CodePudding user response:
There is no problem with regards to the conversion, the issue is loc[]
accesses index values, which are integers. In your example your index is [0,1,2,3,4]
which correctly is of type int, so when you pass datetime objects to loc[]
essentially it is comparing int and datetime giving you the error. Consider using:
x.set_index('My Date').loc[startdate:enddate]
However, if you'd like to slice between two dates, then I suggest you use:
x_filtered = x[(x['My Date'] > startdate) & (x['My Date'] < enddate)]