I'm trying a web scraping in bs4 and I don't know what it is, Pls Someone explain it to me tnx
name = div.contents[0].string div.contents[1]
CodePudding user response:
The contents
attribute holds a list of child elements of the element. The .string
attribute of an element contains the text content for the element.
Using this page as an example:
import requests
from bs4 import BeautifulSoup
from pprint import pprint
resp = requests.get("https://stackoverflow.com/questions/73842279/what-is-contents-in-beautifulsoup4-and-the-number-string")
soup = BeautifulSoup(resp.text, 'html.parser')
for elem in soup.find_all('div'):
if elem.has_attr('id') and elem['id'].strip() == "question-header":
pprint(elem.contents)
pprint(elem.contents[1].string)
output for elem.contents
['\n',
<h1 itemprop="name"><a href="/questions/73842279/what-is-contents-in-beautifulsoup4-and-the-number-string">What is conten
ts in beautifulsoup4 and the number string?</a></h1>,
'\n',
<div >
<a href="/questions/ask">
Ask Question
</a>
</div>,
'\n']
output for elem.contents[1].string
'What is contents in beautifulsoup4 and the number string?'