I want to scrape the data of the first span instance of divs with similar names in Python using BeautifulSoup (BS4). Here is the HTML code:
<div >
<h3>Network Overview</h3>
<div >
<div ><span>32</span><span>Games</span></div>
<div ><span>83,681,202,831.85</span><span>Award</span></div>
<div ><span>18</span><span>Top players</span></div>
</div>
</div>
The HTML above code is from a website, I just copied the portion from which I want to scrape the data. For example my scraped data should look like:
32
83,681,202,831.85
18
I am new to Python data scrapping & I've tried the code below but failed:
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
div = soup.find_all("div", class_="networkstat")
value1 = div[0].find("span").get_text().strip()
value2 = div[1].find("span").get_text().strip()
value3 = div[2].find("span").get_text().strip()
print(value1, value2, value3)
Any help is appreciated.
CodePudding user response:
You could use css selectors
to select the first span child of div:
for e in soup.select('.networkstat span:first-child'):
print(e.get_text())
Example
from bs4 import BeautifulSoup
html = '''
<div >
<h3>Network Overview</h3>
<div >
<div ><span>32</span><span>Games</span></div>
<div ><span>83,681,202,831.85</span><span>Award</span></div>
<div ><span>18</span><span>Top players</span></div>
</div>
</div>
'''
soup = BeautifulSoup(html)
for e in soup.select('.networkstat span:first-child'):
print(e.get_text())
CodePudding user response:
your code should work. the only problem is this requests.get(url).content,
- you use .content
but you should use .text
try this soup = BeautifulSoup(requests.get(url).text, 'html.parser')