I'm struggling with the behavior of pandas .isin when checking if local dates are a local holiday.
I have a data.frame X
with utc timestamps which i convert to local date and keep only one row per date in x_daily
:
import pandas as pd
import holidays
X = pd.DataFrame({'timestampUtc': pd.date_range("2000-12-25", "2001-01-06", freq="1440min", tz="utc")})
X['local_date'] = X['timestampUtc'].dt.tz_convert(tz='Europe/Berlin').dt.date
x_daily = X[['local_date']].drop_duplicates()
No it gets weird: When i try to find the local holidays with .isin
it doesn't find any. When i check each element of the local_date
with in
, all holidays are found correctly. Calling .isin
again after that also finds the correct holidays.
de_holidays = holidays.country_holidays(country='DE', state='BW')
# 1st try: no holidays found with isin
x_daily['local_date'].isin(de_holidays)
# correct holidays found with list comprehension and 'in'
[x_daily['local_date'].iloc[i] in de_holidays for i in range(x_daily.shape[0])]
# 2nd try: correct holidays found with isin
x_daily['local_date'].isin(de_holidays)
What's a reliable and efficient way, to assign a logical column to identify my local holidays?
I paste the whole code in one block again here:
import pandas as pd
import holidays
X = pd.DataFrame({'timestampUtc': pd.date_range("2000-12-25", "2001-01-06", freq="1440min", tz="utc")})
X['local_date'] = X['timestampUtc'].dt.tz_convert(tz='Europe/Berlin').dt.date
x_daily = X[['local_date']].drop_duplicates()
de_holidays = holidays.country_holidays(country='DE', state='BW')
# 1st try: no holidays found with isin
x_daily['local_date'].isin(de_holidays)
# correct holidays found with list comprehension and 'in'
[x_daily['local_date'].iloc[i] in de_holidays for i in range(x_daily.shape[0])]
# 2nd try: correct holidays found with isin
x_daily['local_date'].isin(de_holidays)
CodePudding user response:
The documentation of the holidays module says:
To maximize speed, the list of holidays is built as needed on the fly, one calendar year at a time. When you instantiate the object, it is empty, but the moment a key is accessed it will build that entire year’s list of holidays. To prepopulate holidays, instantiate the class with the years argument:
us_holidays = holidays.US(years=2020)
I.e. you have to access the list first and it will start to populate it.
The implementation of isin
will convert to argument to a list first, which will in your case result in an empty list.
You could change your code to
de_holidays = holidays.country_holidays(country='DE', state='BW', years=[2000, 2001])
and it should work as expected.