Home > Software engineering >  How to get attribute values of HTML tags into a list?
How to get attribute values of HTML tags into a list?

Time:11-28

I want to extract the values of the "id" attribute from a list of table rows like this one: <tr id="8LVPCRJGR" role="row" >, via BeautifulSoup4. In the case of this example tag, I want the "8LVPCRJGR" part.

I tried this block of code (yes, I did import bs4 and requests modules):

url = "https://brawlify.com/stats/club/V8GVVR0R"
result = requests.get(url).text
doc = BeautifulSoup(result, "html.parser")

tag = doc.find_all('tr')

attribute = tag['id']

print(attribute)

It's supposed to print out a list with all the values in it, but nothing prints. The console is blank.

What am I doing wrong here?

CodePudding user response:

Few issues. First, tag is list of elements, specifically all the <tr> tag elements. Secondly, not all the <tr> tags have an 'id' attribute.

So you need to put in some logic for that:

import requests
from bs4 import BeautifulSoup


url = "https://brawlify.com/stats/club/V8GVVR0R"
result = requests.get(url).text
doc = BeautifulSoup(result, "html.parser")

tag = doc.find_all('tr')

attribute = [x['id'] for x in tag if 'id' in x.attrs]

Output:

print(attribute)
['8LVPCRJGR', '29G9VJJC', '2YP08GUG8', 'UY8PVUPL', 'VV2RRRGG', '20RQQ08U9', 'VJ00J8Y8', '200PG2VLP', '28QV0RJVV', 'YRLPJ80J', 'PRLV99U89', '9QJLQGGU', '88UYYG0U', '9PG8RUVJ', 'YP9UQ8CQ', '9J8LRGQU2', '2LPGYQVV9', '8C8CJ0UJU', 'GUGJLLRG', '9Q0VCV2J', '2RVYVL8YL', 'JP0VGC2P', '280GY2R2C', '2PRLQPJJY', '8CGJGPYJ9', '89RYCVQJ0', '80GVU28CC', 'UV0CPU2Q', '9RGG9J08J', 'Y2PQ8090R']

CodePudding user response:

Select a bit more specific - all <tr> with an id and iterate the result set to get each id:

[x['id'] for x in soup.select('tr[id]')]

Example

import requests
from bs4 import BeautifulSoup

r = requests.get('https://brawlify.com/stats/club/V8GVVR0R')
soup = BeautifulSoup(r.text, "html.parser")

attribute = [x['id'] for x in soup.select('tr[id]')]
print(attribute)

Output:

['8LVPCRJGR',
 '29G9VJJC',
 '2YP08GUG8',
 'UY8PVUPL',
 'VV2RRRGG',
 '20RQQ08U9',
 'VJ00J8Y8',
 '200PG2VLP',
 '28QV0RJVV',...]

CodePudding user response:

find_all will return a list

you should loop through this list and extract the id attribute something like this

Edit

following @chitown88 comment you can id if statement to the loop and for @Zaid Hussain comment apparently you can't get the tr tag from the HTML page because the javascript code didn't execute before loading requests.get(url).text to the BeautifulSoup I would recommend trying to inspect the return of requests.get(url).text and if this is the case I would recommend opening the page with selenium through for example chrome driver and passing the HTML code to the BeatifulSoup or just doing the job with Selenium

tags=doc.find_all('tr')
attribute= [tag['id'] for tag in tags if tag['id'] ]
print(attribute)
  • Related