I have this code using Python Requests
library:
import requests
test_URL = "https://www.gasbuddy.com/station/194205"
def get_data(link):
hdr = {'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Mobile Safari/537.36'}
req = requests.get(link,headers=hdr)
content = req.content()
print(content)
get_data(test_URL)
On the website: https://www.gasbuddy.com/station/194205
there is a section Regular
which shows the regular
price for gas. I want to grab that value, but have never done this before so am not sure how I would enter a keyword
query perhaps within the get request? Any pointers or help on how to?
CodePudding user response:
The website has a few mechanisms inplace to prevent webscraping (or to make it harder):
You can use bs4 to analyse the response you get with requests. (pip install beautifulsoup4 https://pypi.org/project/beautifulsoup4/)
import requests
from bs4 import BeautifulSoup
url = "https://www.gasbuddy.com/station/194205"
hdr = {
'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Mobile Safari/537.36'}
resp = requests.get(url, headers=hdr)
After getting the response you can use soup.select
like this to extract the price for regular and premium
soup = BeautifulSoup(resp.text, "html.parser")
regular, premium = (item.text for item in soup.select('span[class*="FuelTypePriceDisplay-module__price___"]'))
At the time writing you get:
('152.9¢', '162.9¢')