I am still learning python. I am confused because I went through the process of pulling each of these tags for just the first result and everything worked beautifully, but when I put it into a loop it throws the error.
For the sake of my learning correct me if I'm wrong, I think this error is telling me that 'result' is a nonetype object and that's why I can't use a method on it, but I thought I understood that for result in results:
is all I need to do to define that as the iteration variable.
URL = 'https://www.zillow.com/eugene-or/rentals/'
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.81 Safari/537.36 Edg/104.0.1293.47", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}
page = requests.get(URL, headers=headers)
soup1 = BeautifulSoup(page.content, "html.parser")
soup2 = BeautifulSoup(soup1.prettify(), "html.parser")
results = soup2.find_all('li', attrs={'class':'ListItem-c11n-8-69-2__sc-10e22w8-0 srp__hpnp3q-0 enEXBq with_constellation'})
records = []
for result in results:
estate = result.find('address').text[16:-15].split('|')
details = result.find('span').text[16:-15].split(' ')
link = 'https://www.zillow.com' result.find('a')['href']
records.append((estate,details,link))
Here is the error I am getting on the for loop.
AttributeError Traceback (most recent call last)
Input In [80], in <cell line: 4>()
3 records = []
4 for result in results:
----> 5 estate = result.find('address').text[16:-15].split('|')
6 details = result.find('span').text[16:-15].split(' ')
7 link = 'https://www.zillow.com' result.find('a')['href']
AttributeError: 'NoneType' object has no attribute 'text'
Thank you in advance for any input.
CodePudding user response:
There are different approaches to fix that - One could be to select your elements more specific:
soup2.select('article:has(address)')
or simply check if address is available:
estate = result.find('address').text[16:-15].split('|') if result.find('address') else None
Also instead of slicing try to strip and split with exact pattern:
estate = result.find('address').get_text(strip=True).split(' | ')