Home > Mobile >  Convert np datetime64 column to pandas DatetimeIndex with frequency attribute set correctly
Convert np datetime64 column to pandas DatetimeIndex with frequency attribute set correctly

Time:01-03

Reproducing the data I have:

import numpy as np
import pandas as pd
dts = ['2016-01-01', '2016-02-01', '2016-03-01', '2016-04-01',
               '2016-05-01', '2016-06-01', '2016-07-01', '2016-08-01',
               '2016-09-01', '2016-10-01', '2016-11-01', '2016-12-01',
               '2017-01-01', '2017-02-01', '2017-03-01', '2017-04-01']

my_df = pd.DataFrame({'col1': range(len(dts)), 'month_beginning': dts})#, dtype={'month_beginning': np.datetime64})
my_df['month_beginning'] = my_df.month_beginning.astype(np.datetime64)

And what I want is to set month_beginning as a datetime index, and specifically I need it to have the frequency attribute set correctly as monthly

Here's what I've tried so far, and how they have not worked:

First attempt

my_df = my_df.set_index('month_beginning')

...however after executing the above, my_df.index shows a DatetimeIndex but with freq=None.

Second attempt

dt_idx = pd.DatetimeIndex(my_df.month_beginning, freq='M')

...but that throws the following error:

ValueError: Inferred frequency MS from passed values does not conform to passed frequency M

...This is particularly confusing to me given that, as can be checked in my data above, my dts/month-beginning series is in fact monthly and is not missing any months...

CodePudding user response:

you could convert the time series to the specified frequency using asfreq:

import pandas as pd

dts = ['2016-01-01', '2016-02-01', '2016-03-01', '2016-04-01',
       '2016-05-01', '2016-06-01', '2016-07-01', '2016-08-01',
       '2016-09-01', '2016-10-01', '2016-11-01', '2016-12-01',
       '2017-01-01', '2017-02-01', '2017-03-01', '2017-04-01']

df = pd.DataFrame({'col1': range(len(dts)), 'month_beginning': dts})
df['month_beginning'] = pd.to_datetime(df['month_beginning'])

df.index = df['month_beginning'] 
df = df.asfreq("MS")

df.index
DatetimeIndex(['2016-01-01', '2016-02-01', '2016-03-01', '2016-04-01',
               '2016-05-01', '2016-06-01', '2016-07-01', '2016-08-01',
               '2016-09-01', '2016-10-01', '2016-11-01', '2016-12-01',
               '2017-01-01', '2017-02-01', '2017-03-01', '2017-04-01'],
              dtype='datetime64[ns]', name='month_beginning', freq='MS')
  • Related