Home > Mobile >  Pandas datetime filter
Pandas datetime filter

Time:05-03

I want to get subset of my dataframe if date is before 2022-04-22. The original df is like below

df:

    date       hour    value
0  2022-04-21  0       10   
1  2022-04-21  1       12   
2  2022-04-21  2       14 
3  2022-04-23  0       10   
4  2022-04-23  1       12   
5  2022-04-23  2       14   

I checked data type by df.dtypes and it told me 'date' column is 'object'.

So I checked individual cell using df['date'][0] and it is datetime.date(2022, 4, 21).

Also, df['date'][0] < datetime.date(2022, 4, 22) gave me 'True'

However, when I wanted to apply this smaller than in whole dataframe by

df2 = df[df['date'] < datetime.date(2022, 4, 22)],

it showed TypeError: '<' not supported between instances of 'str' and 'datetime.date'

Why was this happening? Thanks in advance!

CodePudding user response:

You most likely still have some string dates in one of your rows thus the first element might be ok but a complete comparison of all values using "<" will fail.

Either you use timegeb's answer in the comments.

df['date'] = pd.to_datetime(df['date'])

or you convert them elementwise

import datetime
df['date']=[datetime.datetime.strptime(d,'%Y-%m-%d') if type(d)==str else d for d in test]

Both methods might fail if you have an odd string in any of your rows. In that case you can use:

def convstr2date(d):
    if type(d)==str:
        try:    
            d = datetime.datetime.strptime(str('2022-04-21'),'%Y-%m-%d')
        except:
            d = np.datetime64('NaT')
    return d

df['date'] = [convstr2date(d) for d in df['date']]
  • Related