I have a link to the website with table like the follow: https://www.timeanddate.com/holidays/kenya/2022
How can I:
- read this table as DataFrame in Jupyter Notebook in Python ?
- Convert column "Date" so as to have date format like "01.01.2022" not as exists on website "1 sty"
- how to create column "Day" where will be value like: sobota, niedziela and so on which currently are between columns "Date" and "Name" ?
So, as a result I need something like below:
Date | Day | Name | Type |
---|---|---|---|
01.01.2022 | sobota | New Year's Day | Public holiday |
20.03.2022 | niedziela | March Equinox | Season |
... | ... | ... | ... |
How can I do that in Python Pandas ?
CodePudding user response:
You can do this thanks to beautifulsoup library ... If you click right in the web page with google chrome, you can see the structure of the web page, it's well structured, and easy to extract data between html tags. Also, if you want to extract data of all years, just loop on the web url.
https://www.timeanddate.com/holidays/kenya/2022 https://www.timeanddate.com/holidays/kenya/2021 ...
CodePudding user response:
To read the table on the website as a DataFrame in Jupyter Notebook, you can directly use the pandas library. You can try something similar to this:
from datetime import datetime as dt
import pandas as pd
# Year
year = "2022"
# Read the table on the website into a DataFrame
df = pd.read_html("https://www.timeanddate.com/holidays/kenya/" year)[0]
# Drop NaN
df = df.dropna()
# Convert the "Date" column to the desired date format
df["Date"] = df["Date"].apply(lambda date: date " " year)
df["Date"] = [dt.strptime(df["Date"].iloc[i][0], "%b %d %Y") for i in range(0, len(df))]
# Display the DataFrame
df