I'm trying to remove Columbus Day from pandas.tseries.holiday.USFederalHolidayCalendar
.
This seems to be possible, as a one-time operation, with
from pandas.tseries.holiday import USFederalHolidayCalendar
cal = USFederalHolidayCalendar()
cal = cal.rules.pop(6)
However, if this code is within a function that gets called repeatedly (in a loop) to generate several independent outputs, I get the following error:
IndexError: pop index out of range
It gives me the impression that the object remains in its initial loaded state and as the loop progresses it pops holidays at index 6 until they're gone and then throws an error.
I tried reloading via importlib.reload
to no avail.
Any idea what I'm doing wrong?
CodePudding user response:
The problem here is that rules
is a class attribute (a list of objects). See the code taken from here:
class USFederalHolidayCalendar(AbstractHolidayCalendar):
"""
US Federal Government Holiday Calendar based on rules specified by:
https://www.opm.gov/policy-data-oversight/
snow-dismissal-procedures/federal-holidays/
"""
rules = [
Holiday("New Years Day", month=1, day=1, observance=nearest_workday),
USMartinLutherKingJr,
USPresidentsDay,
USMemorialDay,
Holiday("July 4th", month=7, day=4, observance=nearest_workday),
USLaborDay,
USColumbusDay,
Holiday("Veterans Day", month=11, day=11, observance=nearest_workday),
USThanksgivingDay,
Holiday("Christmas", month=12, day=25, observance=nearest_workday),
]
Since the attribute is defined on the class, there is only one underlying list referred to, so if operations on different instances of that class both attempt to edit the list, then you'll have some unwanted behavior. Here is an example that shows what's going on:
>>> class A:
... rules = [0,1,2]
...
>>> a1 = A()
>>> a2 = A()
>>> a1.rules.pop()
2
>>> a1.rules.pop()
1
>>> a2.rules.pop()
0
>>> a2.rules.pop()
IndexError: pop from empty list
>>> a3 = A()
>>> a3.rules
[]
Also, each module in python is imported only one time
CodePudding user response:
# Import your library
from pandas.tseries.holiday import USFederalHolidayCalendar
# Get an id of 'columbus' in 'rules' list
columbus_index = USFederalHolidayCalendar().rules.index([i for i in USFederalHolidayCalendar().rules if 'Columbus' in str(i)][0])
# Create your own class, inherit 'USFederalHolidayCalendar'
class USFederalHolidayCalendar(USFederalHolidayCalendar):
# Exclude 'columbus' entry
rules = USFederalHolidayCalendar().rules[:columbus_index] USFederalHolidayCalendar().rules[columbus_index 1:]
# Create an object from your class
cal = USFederalHolidayCalendar()
print(cal.rules)
[Holiday: New Years Day (month=1, day=1, observance=<function nearest_workday at 0x7f6afad571f0>),
Holiday: Martin Luther King Jr. Day (month=1, day=1, offset=<DateOffset: weekday=MO( 3)>),
Holiday: Presidents Day (month=2, day=1, offset=<DateOffset: weekday=MO( 3)>),
Holiday: Memorial Day (month=5, day=31, offset=<DateOffset: weekday=MO(-1)>),
Holiday: July 4th (month=7, day=4, observance=<function nearest_workday at 0x7f6afad571f0>),
Holiday: Labor Day (month=9, day=1, offset=<DateOffset: weekday=MO( 1)>),
Holiday: Veterans Day (month=11, day=11, observance=<function nearest_workday at 0x7f6afad571f0>),
Holiday: Thanksgiving (month=11, day=1, offset=<DateOffset: weekday=TH( 4)>),
Holiday: Christmas (month=12, day=25, observance=<function nearest_workday at 0x7f6afad571f0>)]
CodePudding user response:
[T]his code is within a function that gets called repeatedly (in a loop) to generate several independent outputs ... I tried reloading via
importlib.reload
to no avail.
If you really want to import
and pop
inside the function, reload
the holiday
module like so:
from importlib import reload
def f():
from pandas.tseries import holiday
# reload `holiday` and pop Columbus Day
holiday = reload(holiday)
cal = holiday.USFederalHolidayCalendar()
cal.rules.pop(6) # as HenryEcker noted, do not assign back to `cal`
# just for demo, print the first 3 letters per remaining holiday
print([rule.name[:3] for rule in cal.rules])
for _ in range(10):
f()
Just to show there's no Col
umbus Day, the first 3 letters per remaining holiday are printed:
['New', 'Mar', 'Pre', 'Mem', 'Jul', 'Lab', 'Vet', 'Tha', 'Chr']
['New', 'Mar', 'Pre', 'Mem', 'Jul', 'Lab', 'Vet', 'Tha', 'Chr']
['New', 'Mar', 'Pre', 'Mem', 'Jul', 'Lab', 'Vet', 'Tha', 'Chr']
['New', 'Mar', 'Pre', 'Mem', 'Jul', 'Lab', 'Vet', 'Tha', 'Chr']
['New', 'Mar', 'Pre', 'Mem', 'Jul', 'Lab', 'Vet', 'Tha', 'Chr']
['New', 'Mar', 'Pre', 'Mem', 'Jul', 'Lab', 'Vet', 'Tha', 'Chr']
['New', 'Mar', 'Pre', 'Mem', 'Jul', 'Lab', 'Vet', 'Tha', 'Chr']
['New', 'Mar', 'Pre', 'Mem', 'Jul', 'Lab', 'Vet', 'Tha', 'Chr']
['New', 'Mar', 'Pre', 'Mem', 'Jul', 'Lab', 'Vet', 'Tha', 'Chr']
['New', 'Mar', 'Pre', 'Mem', 'Jul', 'Lab', 'Vet', 'Tha', 'Chr']
But if possible, just pass cal
into the function, so you only have to generate and modify cal
once:
# define the function to accept `cal` to avoid repeated importing/reloading
def f(cal):
print([rule.name[:3] for rule in cal.rules])
# generate `cal` and pop once
from pandas.tseries.holiday import USFederalHolidayCalendar
cal = USFederalHolidayCalendar()
cal.rules.pop(6) # as HenryEcker noted, do not assign back to `cal`
for _ in range(10):
f(cal)