Classic case of code used to work, changed nothing, now it doesn't work no more
here. I'm trying to extract a list of unique appid values from this page that I'm saving locally as roguelike.html
The code I have looks like this and it used to work as of a couple months ago when I last ran it, but now the end result is a list of 1 with just a NoneType
in it. Any ideas as to what's going wrong here?
from bs4 import BeautifulSoup
text_file = open("roguelike.html", "rb")
steamdb_text = text_file.read()
text_file.close()
soup = BeautifulSoup(steamdb_text, "html.parser")
trs = [tr for tr in soup.find_all('tr')]
apps = []
for app in soup.find_all('tr'):
apps.append(app.get('data-appid'))
appset = list(set(apps))
Is there a simpler way to get the unique appids from the page source? The individual elements I'm trying to cycle over and grab look like:
<tr data-appid="98821" data-cache="1533726913">
where I want all the unique data-appid
values. I'm scratching my head trying to figure out if formatting in the page changed (doesn't seem like it), or some kind of version upgrade in Spyder, Python, or Beautifulsoup broke something that used to be working.
Any ideas?
CodePudding user response:
I tried this code and it worked well for me. You should make sure that the html file you have is the right file. Perhaps you've hit a capcha test in the html test.