I am trying to web scrape some data from the website - https://boardgamegeek.com/browse/boardgame/page/1
After I have obtained a name of the games and their score, I would also like to open each of these pages and find out how many players are needed for each game. But, when I go into each of the games the URL has a unique number. For example: When I click on the first game- Gloomhaven it opens the page - https://boardgamegeek.com/boardgame/**174430**/gloomhaven (The unique number is marked in bold).
random_no = r.randint(1000,300000)
url2 = "https://boardgamegeek.com/boardgame/" str(random_no) "/" name[0]
page2 = requests.get(url2)
if page2.status_code==200:
print("this is it!")
break
So I generated a random number and plugged it into the URL and read the response. However, even the wrong number gives a correct response but does not open the correct page.
What is this unique number ? How can I get information about this? Or can I use an alternative to get the information I need?
Thanks in advance.
CodePudding user response:
Try this
import requests
import bs4
s = bs4.BeautifulSoup(requests.get(
url = 'https://boardgamegeek.com/browse/boardgame/page/1',
).content, 'html.parser').find('table', {'id': 'collectionitems'})
urls = ['https://boardgamegeek.com' x['href'] for x in s.find_all('a', {'class':'primary'})]
print(urls)