I want to extract the values of the "id" attribute from a list of table rows like this one:
<tr id="8LVPCRJGR" role="row" >
, via BeautifulSoup4. In the case of this example tag, I want the "8LVPCRJGR" part.
I tried this block of code (yes, I did import bs4 and requests modules):
url = "https://brawlify.com/stats/club/V8GVVR0R"
result = requests.get(url).text
doc = BeautifulSoup(result, "html.parser")
tag = doc.find_all('tr')
attribute = tag['id']
print(attribute)
It's supposed to print out a list with all the values in it, but nothing prints. The console is blank.
What am I doing wrong here?
CodePudding user response:
Few issues. First, tag is list of elements, specifically all the <tr>
tag elements. Secondly, not all the <tr>
tags have an 'id'
attribute.
So you need to put in some logic for that:
import requests
from bs4 import BeautifulSoup
url = "https://brawlify.com/stats/club/V8GVVR0R"
result = requests.get(url).text
doc = BeautifulSoup(result, "html.parser")
tag = doc.find_all('tr')
attribute = [x['id'] for x in tag if 'id' in x.attrs]
Output:
print(attribute)
['8LVPCRJGR', '29G9VJJC', '2YP08GUG8', 'UY8PVUPL', 'VV2RRRGG', '20RQQ08U9', 'VJ00J8Y8', '200PG2VLP', '28QV0RJVV', 'YRLPJ80J', 'PRLV99U89', '9QJLQGGU', '88UYYG0U', '9PG8RUVJ', 'YP9UQ8CQ', '9J8LRGQU2', '2LPGYQVV9', '8C8CJ0UJU', 'GUGJLLRG', '9Q0VCV2J', '2RVYVL8YL', 'JP0VGC2P', '280GY2R2C', '2PRLQPJJY', '8CGJGPYJ9', '89RYCVQJ0', '80GVU28CC', 'UV0CPU2Q', '9RGG9J08J', 'Y2PQ8090R']
CodePudding user response:
Select a bit more specific - all <tr>
with an id
and iterate the result set to get each id
:
[x['id'] for x in soup.select('tr[id]')]
Example
import requests
from bs4 import BeautifulSoup
r = requests.get('https://brawlify.com/stats/club/V8GVVR0R')
soup = BeautifulSoup(r.text, "html.parser")
attribute = [x['id'] for x in soup.select('tr[id]')]
print(attribute)
Output:
['8LVPCRJGR',
'29G9VJJC',
'2YP08GUG8',
'UY8PVUPL',
'VV2RRRGG',
'20RQQ08U9',
'VJ00J8Y8',
'200PG2VLP',
'28QV0RJVV',...]
CodePudding user response:
find_all
will return a list
you should loop through this list and extract the id attribute something like this
Edit
following @chitown88 comment
you can id if statement to the loop
and for @Zaid Hussain comment
apparently you can't get the tr
tag from the HTML page because the javascript code didn't execute before loading requests.get(url).text
to the BeautifulSoup I would recommend trying to inspect the return of requests.get(url).text
and if this is the case I would recommend opening the page with selenium through for example chrome driver and passing the HTML code to the BeatifulSoup or just doing the job with Selenium
tags=doc.find_all('tr')
attribute= [tag['id'] for tag in tags if tag['id'] ]
print(attribute)