Home > front end >  Can you bypass a "Terms and Conditions" prompt with BeautifulSoup?
Can you bypass a "Terms and Conditions" prompt with BeautifulSoup?

Time:10-19

I want to scrape a GIS city website for project names active in town of Brighton. https://brighton.maps.arcgis.com/apps/webappviewer/index.html?id=2e3dacc6615e4cf59b6db043cc3f12cc

However, I can't seem to bypass the initial Terms & Agreements checkbox. I'm still new to webscraping so I'm not sure where to begin with this one (outside of the typical imports & requests):

import requests
from bs4 import Beautifulsoup

URL = "https://brighton.maps.arcgis.com/apps/webappviewer/index.html?id=2e3dacc6615e4cf59b6db043cc3f12cc"
content = requests.get(URL)
soup = BeautifulSoup(content.text, "lxml")

I tried to follow this question: How to bypass Terms and Conditions agreement with Beautiful Soup, however, this is a totally different scenario. I feel confident I'll be able to figure out the scraping portion; it's just the "Terms and Agreements" prompt I can't get past. Please help I'm desperate!

CodePudding user response:

No reason to bypass the checkbox as you are interested in the content anyway.

You can right-click the page, select inspect and then the network tab on the right side. Here you can see all the requests your browsers sends to load the page. As you can see its quite a lot. If you are using requests, you have to mimic this behavior. It seems like the data you are probably looking for is actually loaded from a different url.

r = requests.get("https://brighton.maps.arcgis.com/sharing/rest/content/items/2e3dacc6615e4cf59b6db043cc3f12cc/data?f=json").json()

This way you get a dictionary that you can work with. What information exactly are you interested in?

An alternative to requests is the selenium package, that simulates/controls a browser and lets you click on elements etc.

  • Related