TypeError: '>=' not supported between instances of 'datetime.date' and '-CodePudding

My pandas df3 looks like this:

df3 = pd.DataFrame([['23.02.2012', '23.02.2012', 'aaa'], ['27.02.2014', '27.02.2014', 'bbb'], ['17.08.2018', '17.08.2018', 'ccc'], ['22.07.2019', '22.07.2019', 'ddd']], columns=['date', 'period', 'text'])

I want to make column period display the following periods if the dates correspond. Since some date values were formatted with timestamp and some not, it didnt create the correct (without timestamp) period values. That is why i did df3['question_date'].dt.date

df3['date'] = pd.to_datetime(df3['date'], errors = 'coerce')
df3['question_date'] = df3['question_date']
df3['period'] = df3['date']
col_name = 'period'
strt_col = df3.pop(col_name)
df3.insert(5, col_name, strt_col)

date1 = pd.Timestamp('1990-10-14').date()
date2 = pd.Timestamp('1994-11-10').date()
date3 = pd.Timestamp('1999-10-1').date()
date4 = pd.Timestamp('2004-6-13').date()
date5 = pd.Timestamp('2009-8-30').date()
date6 = pd.Timestamp('2014-10-14').date()
date7 = pd.Timestamp('2019-11-26').date()
date8 = pd.Timestamp('2021-9-20').date()
mask1 = (df3['question_date'] >= 'date1') & (df3['question_date'] < 'date2')
mask2 = (df3['question_date'] >= 'date2') & (df3['question_date'] < 'date3')
mask3 = (df3['question_date'] >= 'date3') & (df3['question_date'] < 'date4')
mask4 = (df3['question_date'] >= 'date4') & (df3['question_date'] < 'date5')
mask5 = (df3['question_date'] >= 'date5') & (df3['question_date'] < 'date6')
mask6 = (df3['question_date'] >= 'date6') & (df3['question_date'] < 'date7')
mask7 = (df3['question_date'] >= 'date7') & (df3['question_date'] < 'date8')
df3.loc[mask1, 'leg_per'] = '1990-1994'
df3.loc[mask2, 'leg_per'] = '1994-1999'
df3.loc[mask3, 'leg_per'] = '1999-2004'
df3.loc[mask4, 'leg_per'] = '2004-2009'
df3.loc[mask5, 'leg_per'] = '2009-2014'
df3.loc[mask6, 'leg_per'] = '2014-2019'
df3.loc[mask7, 'leg_per'] = '2019-2021'
.
.
.

At mask1 it throws error

TypeError: '>=' not supported between instances of 'datetime.date' and 'str'

Original question: preventing timestamp creation in to_datetime() formatting in order to group by periods

CodePudding user response：

Here's a slighly modified version of your example,

import pandas as pd

df3 = pd.DataFrame([['23.02.2012', 'aaa'], ['27.02.2014', 'bbb'], 
                    ['17.08.2018', 'ccc'], ['22.07.2019', 'ddd']], 
                   columns=['date', 'text'])

# ensure datetime64[ns] type
df3['date'] = pd.to_datetime(df3['date'], dayfirst=True)

Note that for given date/time format in the example, you need to specify that the day comes first if you don't provide a format. Also, use errors='coerce' only if you're know what you're doing - I always prefer errors to be raised first.

Now let's create a mask for a certain date range. Note that pandas is clever enough that you can use dates as strings (see also Indexing, can be applied to comparisons as well):

date1 = '2010-10-14'
date2 = '2014-11-10'
m = (df3['date'] >= '2010-10-14') & (df3['date'] < '2014-11-10')

df3.loc[m, 'leg_per'] = '2010-2014'

The dummy df then looks like

df3
        date text    leg_per
0 2012-02-23  aaa  2010-2014
1 2014-02-27  bbb  2010-2014
2 2018-08-17  ccc        NaN
3 2019-07-22  ddd        NaN

In general, if you work with date/time in pandas, and want to save yourself some trouble: use the datetime64 data type pandas has to offer. Avoid Python's datetime, date and time class. It will cause issues as the one you're experiencing and reduce functionality you have at hand.