I am learning web scraping using Beautiful Soup and Python. I thought a good first step would be to simply extract the temperature from a website. I have followed a tutorial which taught me to access text from a job posting website. However, while trying to extract the temperature from the weather network website I cannot return the temperature value to the console.
Here is my code,
import requests
from bs4 import BeautifulSoup
url = 'https://www.theweathernetwork.com/us/weather/new-york/new-york'
data = requests.get(url).text
soup = BeautifulSoup(data, 'lxml')
temp = soup.find('span', class_ = "temp").text
print(temp)
when I add the ".text" to the temp variable the code complete but the print is blank.
when I remove the ".text" from the temp variable I get the desired spot in the HTML code but without the integer, like this <span ></span>
Any help is much appreciated! Thanks!
CodePudding user response:
Most likely the value in the span DOM element is computed using a Javascript async call to a third-party API. BeautifulSoup downloads the html response prematurely, so that data is not captured.
CodePudding user response:
I tried and got the same results as you.
You can see that the reason you're not getting anything is because there is nothing to get.
If you run:
with open('output.html','w',encoding = 'utf-8') as f:
f.write(data)
Open the file (output.html) and view the page source. You will see that there is actually nothing in the <span >
section.
This means that those values are only populated at runtime i.e. likely using javascript.
In order to overcome this, you can use some JS render library such as selenium or requests_html. I am not at my desktop at the moment so I can't run it for you and give you the exact code but this is the most common way to go about it.