import pandas as pd
from workalendar.core import Calendar
from workalendar.registry import registry
CalendarClass = registry.get('US')
calendar = CalendarClass()
calendar.holidays(2019)
#> [(datetime.date(2022, 1, 1), 'New year') ...]
You can see the above output that it outputs a list of two elements. How do I convert this into a dataframe of two columns, where one is a date column, and the other is a string column? Among other things I've tried, I can't seem to pd.DataFrame()
on the list.
CodePudding user response:
You can simply use the DataFrame
constructor:
df = pd.DataFrame(calendar.holidays(2019), columns=['date', 'name'])
output:
date name
0 2019-01-01 New year
1 2019-01-21 Birthday of Martin Luther King, Jr.
2 2019-02-18 Washington's Birthday
3 2019-05-27 Memorial Day
4 2019-07-04 Independence Day
5 2019-09-02 Labor Day
6 2019-10-14 Columbus Day
7 2019-11-11 Veterans Day
8 2019-11-28 Thanksgiving Day
9 2019-12-25 Christmas Day
NB. if you want pandas datetime type you can convert the dates with df['date'] = pd.to_datetime(df['date'])
combine several years:
option 1
lst = [calendar.holidays(year) for year in [2019, 2020]]
df = pd.concat([pd.DataFrame(l, columns=['date', 'name']) for l in lst],
ignore_index=True)
date name
0 2019-01-01 New year
1 2019-01-21 Birthday of Martin Luther King, Jr.
2 2019-02-18 Washington's Birthday
...
19 2020-11-26 Thanksgiving Day
20 2020-12-25 Christmas Day
option 2: MultiIndex
df = pd.concat({year: pd.DataFrame(calendar.holidays(year),
columns=['date', 'name'])
for year in [2019, 2020]})
date name
2019 0 2019-01-01 New year
1 2019-01-21 Birthday of Martin Luther King, Jr.
2 2019-02-18 Washington's Birthday
3 2019-05-27 Memorial Day
4 2019-07-04 Independence Day
5 2019-09-02 Labor Day
6 2019-10-14 Columbus Day
7 2019-11-11 Veterans Day
8 2019-11-28 Thanksgiving Day
9 2019-12-25 Christmas Day
2020 0 2020-01-01 New year
1 2020-01-20 Birthday of Martin Luther King, Jr.
2 2020-02-17 Washington's Birthday
3 2020-05-25 Memorial Day
4 2020-07-03 Independence Day (Observed)
5 2020-07-04 Independence Day
6 2020-09-07 Labor Day
7 2020-10-12 Columbus Day
8 2020-11-11 Veterans Day
9 2020-11-26 Thanksgiving Day
10 2020-12-25 Christmas Day
CodePudding user response:
You can use the definition of the df as follows:
df = pd.DataFrame(calendar.holidays(2019), columns=['Date', 'Event'])