Home > Back-end >  How to set a pandas PeriodIndex with yearly frequency?
How to set a pandas PeriodIndex with yearly frequency?

Time:10-27

I am able to create quarterly and monthly PeriodIndex like so:

idx = pd.PeriodIndex(year=[2000, 2001], quarter=[1,2], freq="Q") # quarterly

idx = pd.PeriodIndex(year=[2000, 2001], month=[1,2], freq="M") # monthly

I would expect to be able to create a yearly PeriodIndex like so:

idx = pd.PeriodIndex(year=[2000, 2001], freq="Y")

Instead this throws the following error:

Traceback (most recent call last):
  File ".../script.py", line 3, in <module>
    idx = pd.PeriodIndex(year=[2000, 2001], freq="Y")
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexes/period.py", line 250, in __new__
    data, freq2 = PeriodArray._generate_range(None, None, None, freq, fields)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/arrays/period.py", line 316, in _generate_range
    subarr, freq = _range_from_fields(freq=freq, **fields)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/arrays/period.py", line 1160, in _range_from_fields
    ordinals.append(libperiod.period_ordinal(y, mth, d, h, mn, s, 0, 0, base))
  File "pandas/_libs/tslibs/period.pyx", line 1109, in pandas._libs.tslibs.period.period_ordinal
TypeError: an integer is required

It seems like something that should be very easy to do but yet I cannot understand what is going wrong. Can anybody help?

CodePudding user response:

month and year are both required "fields" due to the current implementation (through pandas 1.5.1 at least). Most other field values will be configured with a default value, however, neither month or year will be defined if a value is not provided. Therefore, in this case, month will remain None which causes the error shown

TypeError: an integer is required

Here is a link to the relevant section of the source code where default values are defined. Omitting the month field results in [None, None] (in this case) which cannot be converted to a Periodindex.


A correct index can be built as follows.

idx = pd.PeriodIndex(year=[2000, 2001], month=[1, 1], freq='Y')

Resulting in:

PeriodIndex(['2000', '2001'], dtype='period[A-DEC]')

Depending on the number of years, it may also make sense to programmatically generate the list of months:

years = [2000, 2001]
idx = pd.PeriodIndex(year=years, month=[1] * len(years), freq='Y')

As an alternative, it may be easier to use to_datetime to_period to create the Period index from a Datetime index instead (as it is already in a compatible form)

pd.to_datetime([2000, 2001], format='%Y').to_period('Y')

Resulting in the same PeriodIndex:

PeriodIndex(['2000', '2001'], dtype='period[A-DEC]')

CodePudding user response:

idx = pd.period_range(start="1/1/2000", freq="Y", periods=2)
  • Related