I am trying to make an app that takes values from a website. For instance, from [https://steamcommunity.com/id/pintipanda/games/?tab=all] this page I want to get every id of the div that are classed as "gameListRow".
But when I try:
from bs4 import BeautifulSoup
import requests
html_text = requests.get('https://steamcommunity.com/id/pintipanda/games/?tab=all').text
soup = BeautifulSoup(html_text, 'lxml')
div = soup.find_all('div', {'class': 'gameListRow'})
print(div)
It prints an empty list. How to choose all boxes classed under gameListRow?
CodePudding user response:
The data you see is stored inside <script>
on the page (so beautifulsoup
doesn't see it). To parse it, you can use this example:
import re
import json
import requests
url = "https://steamcommunity.com/id/pintipanda/games/?tab=all"
data = requests.get(url).text
data = re.search(r"var rgGames = (.*]);", data).group(1)
data = json.loads(data)
# uncomment to print all data:
# print(json.dumps(data, indent=4))
for d in data:
print("{:<10} {}".format(d["appid"], d["name"]))
Prints:
730 Counter-Strike: Global Offensive
578080 PUBG: BATTLEGROUNDS
261550 Mount & Blade II: Bannerlord
570 Dota 2
305620 The Long Dark
550 Left 4 Dead 2
413150 Stardew Valley
...