I am trying to parse out a dynamic content in a nested span structure. The text I want to get is "dynamic content2" which is the content of the second span element (class = "second span") and its value is being updated regularly.
<html>
<div class="outer div">
<span class="first span">
<span>random content</span>
</span>
<span class="second span">
<span>dynamic content2</span>
</span>
</div>
</html>
I am new to web scraping, and currently this is what I have:
import os, sys
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
opts = Options()
opts.add_argument(" --headless")
chrome_driver = os.getcwd() "\\chromedriver.exe"
# Instantiate a webdriver
driver = webdriver.Chrome(options=opts, executable_path=chrome_driver)
driver.get("some url")
soup_file=driver.page_source
soup = BeautifulSoup(soup_file)
# works fine
print(soup.title.get_text())
print("Testing getting dynamic element")
spanId = 'second span'
mySpan = soup.find("span", class_ = spanId )
print(mySpan.get_text())
driver.quit()
But there is nothing returned. Any help is appreciated.
CodePudding user response:
Here is the output as dynamic content2
Code:
tag="""
<html>
<div class="outer div">
<span class="first span">
<span>random content</span>
</span>
<span class="second span">
<span>dynamic content2</span>
</span>
</div>
</html>
"""
soup = BeautifulSoup(tag, 'html.parser')
#span= soup.select_one('div.outer.div > span.second.span >span').text
#or
span= soup.select_one('span.second.span >span').text
print(span)
Output:
dynamic content2