Home > Back-end >  Web scraping using beautiful soup Python
Web scraping using beautiful soup Python

Time:12-02

I am trying to web scrape some data from the website - https://boardgamegeek.com/browse/boardgame/page/1

After I have obtained a name of the games and their score, I would also like to open each of these pages and find out how many players are needed for each game. But, when I go into each of the games the URL has a unique number. For example: When I click on the first game- Gloomhaven it opens the page - https://boardgamegeek.com/boardgame/**174430**/gloomhaven (The unique number is marked in bold).

    random_no = r.randint(1000,300000)
    url2 = "https://boardgamegeek.com/boardgame/" str(random_no) "/" name[0]    
    page2 = requests.get(url2)
    if page2.status_code==200:
        print("this is it!")
        break

So I generated a random number and plugged it into the URL and read the response. However, even the wrong number gives a correct response but does not open the correct page.

What is this unique number ? How can I get information about this? Or can I use an alternative to get the information I need?

Thanks in advance.

CodePudding user response:

Try this

import requests
import bs4

s = bs4.BeautifulSoup(requests.get(
    url = 'https://boardgamegeek.com/browse/boardgame/page/1',
).content, 'html.parser').find('table', {'id': 'collectionitems'})

urls = ['https://boardgamegeek.com' x['href'] for x in s.find_all('a', {'class':'primary'})]

print(urls)
  • Related