I am able to create quarterly and monthly PeriodIndex like so:
idx = pd.PeriodIndex(year=[2000, 2001], quarter=[1,2], freq="Q") # quarterly
idx = pd.PeriodIndex(year=[2000, 2001], month=[1,2], freq="M") # monthly
I would expect to be able to create a yearly PeriodIndex like so:
idx = pd.PeriodIndex(year=[2000, 2001], freq="Y")
Instead this throws the following error:
Traceback (most recent call last):
File ".../script.py", line 3, in <module>
idx = pd.PeriodIndex(year=[2000, 2001], freq="Y")
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexes/period.py", line 250, in __new__
data, freq2 = PeriodArray._generate_range(None, None, None, freq, fields)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/arrays/period.py", line 316, in _generate_range
subarr, freq = _range_from_fields(freq=freq, **fields)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/arrays/period.py", line 1160, in _range_from_fields
ordinals.append(libperiod.period_ordinal(y, mth, d, h, mn, s, 0, 0, base))
File "pandas/_libs/tslibs/period.pyx", line 1109, in pandas._libs.tslibs.period.period_ordinal
TypeError: an integer is required
It seems like something that should be very easy to do but yet I cannot understand what is going wrong. Can anybody help?
CodePudding user response:
month
and year
are both required "fields" due to the current implementation (through pandas 1.5.1 at least). Most other field values will be configured with a default value, however, neither month
or year
will be defined if a value is not provided. Therefore, in this case, month will remain None
which causes the error shown
TypeError: an integer is required
Here is a link to the relevant section of the source code where default values are defined. Omitting the month field results in [None, None]
(in this case) which cannot be converted to a Periodindex.
A correct index can be built as follows.
idx = pd.PeriodIndex(year=[2000, 2001], month=[1, 1], freq='Y')
Resulting in:
PeriodIndex(['2000', '2001'], dtype='period[A-DEC]')
Depending on the number of years, it may also make sense to programmatically generate the list of months:
years = [2000, 2001]
idx = pd.PeriodIndex(year=years, month=[1] * len(years), freq='Y')
As an alternative, it may be easier to use to_datetime to_period to create the Period index from a Datetime index instead (as it is already in a compatible form)
pd.to_datetime([2000, 2001], format='%Y').to_period('Y')
Resulting in the same PeriodIndex:
PeriodIndex(['2000', '2001'], dtype='period[A-DEC]')
CodePudding user response:
idx = pd.period_range(start="1/1/2000", freq="Y", periods=2)