Home > Software design >  generate a DataFrame of multiple years holidays with workalendar
generate a DataFrame of multiple years holidays with workalendar

Time:05-17

import pandas as pd
from workalendar.core import Calendar
from workalendar.registry import registry

CalendarClass = registry.get('US')
calendar = CalendarClass()
calendar.holidays(2019)

#> [(datetime.date(2022, 1, 1), 'New year') ...]

You can see the above output that it outputs a list of two elements. How do I convert this into a dataframe of two columns, where one is a date column, and the other is a string column? Among other things I've tried, I can't seem to pd.DataFrame() on the list.

CodePudding user response:

You can simply use the DataFrame constructor:

df = pd.DataFrame(calendar.holidays(2019), columns=['date', 'name'])

output:

         date                                 name
0  2019-01-01                             New year
1  2019-01-21  Birthday of Martin Luther King, Jr.
2  2019-02-18                Washington's Birthday
3  2019-05-27                         Memorial Day
4  2019-07-04                     Independence Day
5  2019-09-02                            Labor Day
6  2019-10-14                         Columbus Day
7  2019-11-11                         Veterans Day
8  2019-11-28                     Thanksgiving Day
9  2019-12-25                        Christmas Day

NB. if you want pandas datetime type you can convert the dates with df['date'] = pd.to_datetime(df['date'])

combine several years:

option 1

lst = [calendar.holidays(year) for year in [2019, 2020]]
df = pd.concat([pd.DataFrame(l, columns=['date', 'name']) for l in lst],
               ignore_index=True)

          date                                 name
0   2019-01-01                             New year
1   2019-01-21  Birthday of Martin Luther King, Jr.
2   2019-02-18                Washington's Birthday
...
19  2020-11-26                     Thanksgiving Day
20  2020-12-25                        Christmas Day

option 2: MultiIndex

df = pd.concat({year: pd.DataFrame(calendar.holidays(year),
                                   columns=['date', 'name'])
                for year in [2019, 2020]})

               date                                 name
2019 0   2019-01-01                             New year
     1   2019-01-21  Birthday of Martin Luther King, Jr.
     2   2019-02-18                Washington's Birthday
     3   2019-05-27                         Memorial Day
     4   2019-07-04                     Independence Day
     5   2019-09-02                            Labor Day
     6   2019-10-14                         Columbus Day
     7   2019-11-11                         Veterans Day
     8   2019-11-28                     Thanksgiving Day
     9   2019-12-25                        Christmas Day
2020 0   2020-01-01                             New year
     1   2020-01-20  Birthday of Martin Luther King, Jr.
     2   2020-02-17                Washington's Birthday
     3   2020-05-25                         Memorial Day
     4   2020-07-03          Independence Day (Observed)
     5   2020-07-04                     Independence Day
     6   2020-09-07                            Labor Day
     7   2020-10-12                         Columbus Day
     8   2020-11-11                         Veterans Day
     9   2020-11-26                     Thanksgiving Day
     10  2020-12-25                        Christmas Day

CodePudding user response:

You can use the definition of the df as follows:

df = pd.DataFrame(calendar.holidays(2019), columns=['Date', 'Event'])
  • Related