Home > database >  Confusion over datetime64, timestamp and pd.DateOffset()
Confusion over datetime64, timestamp and pd.DateOffset()

Time:11-12

I have this data, dtype datetime64[ns]

df.date_month
Output: 
0         2018-09-01
1         2018-09-01
2         2018-09-01
3         2018-09-01
4         2018-09-01
             ...    
Name: date_month, Length: 4839993, dtype: datetime64[ns]

If I run a for loop and add pd.Offset, the code runs.

for i in df.date_month[0:10]:
  print(i   pd.DateOffset(months=12))

Output:
2019-09-01 00:00:00
2019-09-01 00:00:00
2019-09-01 00:00:00
2019-09-01 00:00:00

however, if I use unique(), the code breaks.

for i in df.date_month.unique():
  print(i   pd.DateOffset(months=12))

Output:
UFuncTypeError                            Traceback (most recent call last)
<command-3708796390803054> in <module>
      1 for i in df.date_month.unique():
----> 2   print(i   pd.DateOffset(months=12))

UFuncTypeError: ufunc 'add' cannot use operands with types dtype('<M8[ns]') and dtype('O')

Can someone help me why this happens? Does unique() transform the data in some way?

df.date_month.unique()
Output: 
array(['2018-09-01T00:00:00.000000000', '2018-04-01T00:00:00.000000000',
       '2018-12-01T00:00:00.000000000', '2018-11-01T00:00:00.000000000',
       '2018-07-01T00:00:00.000000000', '2018-05-01T00:00:00.000000000',
       '2018-06-01T00:00:00.000000000', '2018-10-01T00:00:00.000000000',
       '2018-08-01T00:00:00.000000000'], dtype='datetime64[ns]')

CodePudding user response:

That's correct, unique() returns an array from the Series you are passing it. See --> https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.unique.html

You more than likely want to use drop_duplicates() --> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop_duplicates.html

i.e.

for i in df.drop_duplicates(subset=['date_month']).date_month:
    print(i   pd.DateOffset(months=12))
  • Related