This table goes from page 0 to page 27.
I have successfully scraped the table into a pandas df for page 0:
url = 'https://home.treasury.gov/resource-center/data-chart-center/interest-rates/TextView?type=daily_treasury_yield_curve&field_tdr_date_value=all&page=0'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
#getting the table
table = soup.find('table', {'class':'views-table views-view-table cols-20'})
headers = []
for i in table.find_all('th'):
title = i.text.strip()
headers.append(title)
df = pd.DataFrame(columns = headers)
for row in table.find_all('tr')[1:]:
data = row.find_all('td')
row_data = [td.text.strip() for td in data]
length = len(df)
df.loc[length] = row_data
Now I need to do the same for all the pages and store it into a single a df.
CodePudding user response:
You can use pandas.read_html
to parse tables to dataframes and then concat them:
import pandas as pd
url = "https://home.treasury.gov/resource-center/data-chart-center/interest-rates/TextView?type=daily_treasury_yield_curve&field_tdr_date_value=all&page={}"
all_df = []
for page in range(0, 10): # <-- increase number of pages here
print("Getting page", page)
all_df.append(pd.read_html(url.format(page))[0])
final_df = pd.concat(all_df).reset_index(drop=True)
print(final_df.tail(10).to_markdown(index=False))
Date | 20 YR | 30 YR | Extrapolation Factor | 8 WEEKS BANK DISCOUNT | COUPON EQUIVALENT | 52 WEEKS BANK DISCOUNT | COUPON EQUIVALENT.1 | 1 Mo | 2 Mo | 3 Mo | 6 Mo | 1 Yr | 2 Yr | 3 Yr | 5 Yr | 7 Yr | 10 Yr | 20 Yr | 30 Yr |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
12/13/2001 | nan | nan | nan | nan | nan | nan | nan | 1.69 | nan | 1.69 | 1.78 | 2.2 | 3.09 | 3.62 | 4.4 | 4.9 | 5.13 | 5.81 | 5.53 |
12/14/2001 | nan | nan | nan | nan | nan | nan | nan | 1.7 | nan | 1.73 | 1.81 | 2.22 | 3.2 | 3.73 | 4.52 | 5.01 | 5.24 | 5.89 | 5.59 |
12/17/2001 | nan | nan | nan | nan | nan | nan | nan | 1.72 | nan | 1.74 | 1.84 | 2.24 | 3.21 | 3.74 | 4.54 | 5.03 | 5.26 | 5.91 | 5.61 |
12/18/2001 | nan | nan | nan | nan | nan | nan | nan | 1.72 | nan | 1.71 | 1.81 | 2.24 | 3.13 | 3.66 | 4.46 | 4.93 | 5.16 | 5.81 | 5.52 |
12/19/2001 | nan | nan | nan | nan | nan | nan | nan | 1.69 | nan | 1.69 | 1.8 | 2.23 | 3.11 | 3.63 | 4.38 | 4.84 | 5.08 | 5.73 | 5.45 |
12/20/2001 | nan | nan | nan | nan | nan | nan | nan | 1.67 | nan | 1.69 | 1.79 | 2.22 | 3.15 | 3.67 | 4.42 | 4.86 | 5.08 | 5.73 | 5.43 |
12/21/2001 | nan | nan | nan | nan | nan | nan | nan | 1.67 | nan | 1.71 | 1.81 | 2.23 | 3.17 | 3.69 | 4.45 | 4.89 | 5.12 | 5.76 | 5.45 |
12/24/2001 | nan | nan | nan | nan | nan | nan | nan | 1.66 | nan | 1.72 | 1.83 | 2.24 | 3.22 | 3.74 | 4.49 | 4.95 | 5.18 | 5.81 | 5.49 |
12/26/2001 | nan | nan | nan | nan | nan | nan | nan | 1.77 | nan | 1.75 | 1.87 | 2.34 | 3.26 | 3.8 | 4.55 | 5 | 5.22 | 5.84 | 5.52 |
12/27/2001 | nan | nan | nan | nan | nan | nan | nan | 1.75 | nan | 1.74 | 1.84 | 2.27 | 3.19 | 3.71 | 4.46 | 4.9 | 5.13 | 5.78 | 5.49 |
CodePudding user response:
You can make the pagination using for loop
url = 'https://home.treasury.gov/resource-center/data-chart-center/interest-rates/TextView?type=daily_treasury_yield_curve&field_tdr_date_value=all&page={p}'
for p in range(0,27):
page = requests.get(url.format(p=p))
soup = BeautifulSoup(page.text, 'lxml')
#getting the table
table = soup.find('table', {'class':'views-table views-view-table cols-20'})
headers = []
for i in table.find_all('th'):
title = i.text.strip()
headers.append(title)
df = pd.DataFrame(columns = headers)
for row in table.find_all('tr')[1:]:
data = row.find_all('td')
row_data = [td.text.strip() for td in data]
length = len(df)
df.loc[length] = row_data