How do you get a text from a span tag using BeautifulSoup when there's no clear identification?-CodePudding

enter image description here

I am trying to extract the value from this span tag for Year Built using BeautifulSoup and the following code below, but I'm not getting the actual Year. Please help. Thanks :)

enter image description here

results = []
for url in All_product[:2]:
   link = url
   html = getAndParseURL(url)
   YearBuilt = html.findAll("span", {"class":"header font-color- 
   gray-light inline-block"})[4]
   results.append([YearBuilt])

The output shows

[[<span >Year Built</span>],
[<span >Community</span>]]

CodePudding user response：

Try using the .next_sibling:

result = []
year_built = html.find_all(
   "span", {"class":"header font-color- gray-light inline-block"}
)
for elem in year_built:
    if elem.text.strip() == 'Year Built':
        result.append(elem.next_sibling)

I'm not sure how the whole HTML looks, but something along these lines might help.

CodePudding user response：

Note: Sure there would be a more specific solution to extract all attributes for your results you may need, but therefor you should improve your question and add more details

Using css selectors you can simply chain / combinate your selection to be more strict. In this case you select the <span> contains your string and use adjacent sibling combinator to get the next sibling <span>.

YearBuilt = e.text if (e := html.select_one('span.header:-soup-contains("Year Built")   span')) else None

It also avoid AttributeError: 'NoneType' object has no attribute 'text', if element is not available you can check if it exists before calling text method

soup = BeautifulSoup(html_doc, "html.parser")

results = []
for url in All_product[:2]:
    link = url
    html = getAndParseURL(url)
    YearBuilt = e.text if (e := html.select_one('span.header:-soup-contains("Year Built")   span')) else None
    results.append([YearBuilt])