I want to get subset of my dataframe if date is before 2022-04-22. The original df is like below
df:
date hour value
0 2022-04-21 0 10
1 2022-04-21 1 12
2 2022-04-21 2 14
3 2022-04-23 0 10
4 2022-04-23 1 12
5 2022-04-23 2 14
I checked data type by df.dtypes and it told me 'date' column is 'object'.
So I checked individual cell using df['date'][0] and it is datetime.date(2022, 4, 21).
Also, df['date'][0] < datetime.date(2022, 4, 22) gave me 'True'
However, when I wanted to apply this smaller than in whole dataframe by
df2 = df[df['date'] < datetime.date(2022, 4, 22)],
it showed TypeError: '<' not supported between instances of 'str' and 'datetime.date'
Why was this happening? Thanks in advance!
CodePudding user response:
You most likely still have some string dates in one of your rows thus the first element might be ok but a complete comparison of all values using "<" will fail.
Either you use timegeb's answer in the comments.
df['date'] = pd.to_datetime(df['date'])
or you convert them elementwise
import datetime
df['date']=[datetime.datetime.strptime(d,'%Y-%m-%d') if type(d)==str else d for d in test]
Both methods might fail if you have an odd string in any of your rows. In that case you can use:
def convstr2date(d):
if type(d)==str:
try:
d = datetime.datetime.strptime(str('2022-04-21'),'%Y-%m-%d')
except:
d = np.datetime64('NaT')
return d
df['date'] = [convstr2date(d) for d in df['date']]