I have a probably simple question about bs4 that I can't seem to figure out.
And for reference I am self-taught and am troubleshooting my way through learning python.
So essentially a chunk of a bigger project I'm working on requires me to scrape a website to get the most up to date rate of a 1 month T-bill. I was able to get 99% of it down, but one aspect of it I'm stuck on.
Essentially this data only updates mon-fri. And running this code say at 8 am before the site has been updated for the day or on the weekend returns an error. When using a date that has been updated I am able to get the exact data I need.
So I have set variables d1, d2 and d3 as today, yesterday, and two day's ago. I want to use my soup.find to search for today, and if none search for yesterday, and then two days ago.
In my code if I use text=d3, for example, I get a value returned.
Here's what I have right now, would really appreciate some help!
from bs4 import BeautifulSoup
import requests
from datetime import date
import datetime
today = date.today()
d1 = today.strftime("%B %d, %Y")
ndays1 = datetime.timedelta(days = 1)
d2 = (today-ndays1).strftime("%B %d, %Y")
ndays2 = datetime.timedelta(days = 2)
d3 = (today-ndays2).strftime("%B %d, %Y")
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/71.0.3578.98 Safari/537.36',
'Accept': 'text/html,application/xhtml xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'DNT': '1', # Do Not Track Request Header
'Connection': 'close'
}
url_rfr = "https://ycharts.com/indicators/1_month_treasury_rate"
response = requests.get(url_rfr, headers=headers, timeout=5).text
soup = BeautifulSoup(response, 'html.parser')
div = soup.find("td", text=d1 or d2 or d3).find_next_sibling("td").text.strip()
r = (float(div[:-1]))
print(r)
CodePudding user response:
So, I changed the text
in find(...)
to "Last Value"
and also added latest_period
scrape for completeness
import datetime
from datetime import date
import requests
from bs4 import BeautifulSoup
today = date.today()
d1 = today.strftime("%B %d, %Y")
ndays1 = datetime.timedelta(days=1)
d2 = (today - ndays1).strftime("%B %d, %Y")
ndays2 = datetime.timedelta(days=2)
d3 = (today - ndays2).strftime("%B %d, %Y")
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/71.0.3578.98 Safari/537.36',
'Accept': 'text/html,application/xhtml xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'DNT': '1', # Do Not Track Request Header
'Connection': 'close'
}
url_rfr = "https://ycharts.com/indicators/1_month_treasury_rate"
response = requests.get(url_rfr, headers=headers, timeout=5).text
soup = BeautifulSoup(response, 'html.parser')
latest_period = soup.find("td", text="Latest Period").find_next_sibling("td").text.strip()
value = soup.find("td", text="Last Value").find_next_sibling("td").text.strip()
val = (float(value[:-1]))
print(latest_period, val) # Feb 11 2022 0.03