For the Zillow data below, number of beds (bds), number of bath (ba) and square foot (sqfr) have the same tag <li >
.
How can I get information for these 3 elements. My code below is clearly not working.
The result should be: 3 , 2, 1813
Can you please help? Thanks Hong
<div class="list-card-info"><a class="list-card-link list-card-link-top-margin" href="https://www.zillow.com/homedetails/12021-Tralee-Rd-UNIT-102-Lutherville-MD-21093/60873148_zpid/" tabindex="0">
# <address >12021 Tralee Rd UNIT 102, Lutherville, MD 21093</address></a>
# <div ><p >LONG & FOSTER REAL ESTATE, INC.</p></div><div >
# <div >$411,000</div><ul >
# <li >3<abbr > <!-- -->bds</abbr></li>
# <li >2<abbr > <!-- -->ba</abbr></li>
# <li >1,813<abbr > <!-- -->sqft</abbr>
# </li><li >- Apartment for sale</li></ul></div></div>
tag='<div ><a href="https://www.zillow.com/homedetails/12021-Tralee-Rd-UNIT-102-Lutherville-MD-21093/60873148_zpid/" tabindex="0"><address >12021 Tralee Rd UNIT 102, Lutherville, MD 21093</address></a><div ><p >LONG & FOSTER REAL ESTATE, INC.</p></div><div ><div >$411,000</div><ul ><li >3<abbr > <!-- -->bds</abbr></li><li >2<abbr > <!-- -->ba</abbr></li><li >1,813<abbr > <!-- -->sqft</abbr></li><li >- Apartment for sale</li></ul></div></div>'
tag = BeautifulSoup(tag, 'html.parser')
address = tag.findAll('address', {'class': 'list-card-addr'})
price = tag.findAll('div', {'class': 'list-card-price'})
beds = tag.findAll('li', {'class': ""})
# keep text only, remove tag
address=address[0].text;
price=price[0].text ;
beds=beds[0].text; print(beds)
print(address, '---',price, '---',beds)
CodePudding user response:
When you call tag.findAll
it creates a ResultSet with all three values saved. You can then access each one using the index number, as shown below.
from bs4 import BeautifulSoup
tag= '<div ><a href="https://www.zillow.com/homedetails/12021-Tralee-Rd-UNIT-102-Lutherville-MD-21093/60873148_zpid/" tabindex="0"><address >12021 Tralee Rd UNIT 102, Lutherville, MD 21093</address></a><div ><p >LONG & FOSTER REAL ESTATE, INC.</p></div><div ><div >$411,000</div><ul ><li >3<abbr > <!-- -->bds</abbr></li><li >2<abbr > <!-- -->ba</abbr></li><li >1,813<abbr > <!-- -->sqft</abbr></li><li >- Apartment for sale</li></ul></div></div>'
tag = BeautifulSoup(tag, 'html.parser')
tags = tag.findAll('li', {'class': ""})
# keep text only, remove tag
address=tags[0].text;
price=tags[1].text ;
beds=tags[2].text;
print(address, '---',price, '---',beds)
CodePudding user response:
That should do it:
#<div ><a href="https://www.zillow.com/homedetails/12021-Tralee-Rd-UNIT-102-Lutherville-MD-21093/60873148_zpid/" tabindex="0">
# <address >12021 Tralee Rd UNIT 102, Lutherville, MD 21093</address></a>
# <div ><p >LONG & FOSTER REAL ESTATE, INC.</p></div><div >
# <div >$411,000</div><ul >
# <li >3<abbr > <!-- -->bds</abbr></li>
# <li >2<abbr > <!-- -->ba</abbr></li>
# <li >1,813<abbr > <!-- -->sqft</abbr>
# </li><li >- Apartment for sale</li></ul></div></div>
tag='<div ><a href="https://www.zillow.com/homedetails/12021-Tralee-Rd-UNIT-102-Lutherville-MD-21093/60873148_zpid/" tabindex="0"><address >12021 Tralee Rd UNIT 102, Lutherville, MD 21093</address></a><div ><p >LONG & FOSTER REAL ESTATE, INC.</p></div><div ><div >$411,000</div><ul ><li >3<abbr > <!-- -->bds</abbr></li><li >2<abbr > <!-- -->ba</abbr></li><li >1,813<abbr > <!-- -->sqft</abbr></li><li >- Apartment for sale</li></ul></div></div>'
tag = BeautifulSoup(tag, 'html.parser')
list_items = tag.findAll('li', {'class': ""})
# keep text only, remove tag
regex = re.compile('([\\d,]*)')
address = regex.findall(list_items[0].text)[0]
price = regex.findall(list_items[1].text)[0]
beds = regex.findall(list_items[2].text)[0]
print(address, '---',price, '---',beds)