I am trying to extract the value from this span tag for Year Built using BeautifulSoup and the following code below, but I'm not getting the actual Year. Please help. Thanks :)
results = []
for url in All_product[:2]:
link = url
html = getAndParseURL(url)
YearBuilt = html.findAll("span", {"class":"header font-color-
gray-light inline-block"})[4]
results.append([YearBuilt])
The output shows
[[<span >Year Built</span>],
[<span >Community</span>]]
CodePudding user response:
Try using the .next_sibling
:
result = []
year_built = html.find_all(
"span", {"class":"header font-color- gray-light inline-block"}
)
for elem in year_built:
if elem.text.strip() == 'Year Built':
result.append(elem.next_sibling)
I'm not sure how the whole HTML looks, but something along these lines might help.
CodePudding user response:
Note: Sure there would be a more specific solution to extract all attributes for your results you may need, but therefor you should improve your question and add more details
Using css selectors you can simply chain / combinate your selection to be more strict. In this case you select the <span>
contains your string and use adjacent sibling combinator
to get the next sibling <span>
.
YearBuilt = e.text if (e := html.select_one('span.header:-soup-contains("Year Built") span')) else None
It also avoid AttributeError: 'NoneType' object has no attribute 'text'
, if element is not available you can check if it exists before calling text method
soup = BeautifulSoup(html_doc, "html.parser")
results = []
for url in All_product[:2]:
link = url
html = getAndParseURL(url)
YearBuilt = e.text if (e := html.select_one('span.header:-soup-contains("Year Built") span')) else None
results.append([YearBuilt])