Home > Mobile >  How to identify all tags used in website using Selenium
How to identify all tags used in website using Selenium

Time:06-19

The BeautifulSoup equivalent I am trying to accomplish is:

page_soup = soup(page_html)
tags = {tag.name for tag in page_soup.find_all()}
tags

How do I do this using Selenium? I'm just trying to print out the unique tags used by a website without having to go through the entire HTML source code, so I can begin analysing it and scrape specific parts of the website. I don't care what the content of the tags are at this point, I just want to know what tags are used.

An answer I've stumbled upon, but not sure if there is a more elegant way of doing things is this...

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By

website = 'https://www.afr.com'

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get(website)

el = driver.find_elements(by=By.CSS_SELECTOR, value='*')

tag_list = []

for e in el:
    tag_list.append(e.tag_name)

tag_list = pd.Series(tag_list).unique()

for t in tag_list:
    print(t)

CodePudding user response:

Beautifulsoup is better for this specific scenario.

But if you still want to use Selenium, you can try:

elems = driver.find_elements_by_tag_name('*')

tags = []
for x in elems:
    taggs.append(x.tag_name)

Which is equivalent to:

elems = driver.find_elements_by_tag_name('*')

tags = [x.tag_name for x in elems]

If you finally want to get only the unique values, you could use the set() built-in data type for example:

set(tags)
  • Related