Trying to scrape a site with an ID but I can't figure out how to fix it:
from bs4 import BeautifulSoup
import requests
url= "Website"
page= requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
lists = soup.find_all ('div', class_="position-relative")
for list in lists:
Value = list.find('h5', id_= "player_value")
print (Value)
Now with that it will just print:
None
Here is what the website inspect mode looks like:
CodePudding user response:
You need to pass the class in a dict, try that:
lists = soup.find_all ('div', {'class': 'position-relative'})
CodePudding user response:
Remove the _
from attribute parameter id
:
.find('h5', id= "player_value")
Why _
is needed for the class
from the docs:
“class”, is a reserved word in Python. Using class as a keyword argument will give you a syntax error. As of Beautiful Soup 4.1.2, you can search by CSS class using the keyword argument class_
Example
Assuming that there is an unique id you could get your value directly:
from bs4 import BeautifulSoup
html='''
<h5 id="player_value">1</h5>
'''
soup = BeautifulSoup(html)
player_value = soup.find('h5', id= "player_value").text
print(player_value)
If the id of your <h5>
is not unique and you want to get all - Avoid also to use other reserved words like list
:
from bs4 import BeautifulSoup
html='''
<h5 id="player_value">1</h5>
<h5 id="player_value">2</h5>
<h5 id="player_value">3</h5>
<h5 id="player_value">4</h5>
'''
soup = BeautifulSoup(html)
for l in soup.find_all('h5', id = "player_value"):
print (l.text)