I'm trying to scrape sitemap from a site using beautifulsoup
but I'm facing huge problem. There is my code, the error is
"TypeError: 'NoneType' object is not subscriptable"
Here is my code
import requests
from bs4 import BeautifulSoup as bs
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36'}
url = "https://www.celebheights.com/"
res= requests.get(url, headers=headers)
html = bs(res.text, 'html.parser')
lilink = html.findAll('li')
for li in lilink:
alink = li.find('a')['href']
print(alink)
How can I solve this problem?
CodePudding user response:
You could use print()
to see what you have in variables in line which make problem.
This page has some <li>
without <a>
and this makes problem.
You have to check what you have in alink
because sometimes it is None
.
for li in lilink:
alink = li.find('a')
if alink:
url = alink['href']
print(url)
else:
print('<li> without <a>:', li)
Result:
https://www.celebheights.com/
https://www.celebheights.com/comments.html
https://www.celebheights.com/s/latest_1.html
https://www.celebheights.com/s/compare.php
https://www.celebheights.com/s/top50.html
https://www.youtube.com/user/robpaul
<li> without <a>: <li id="ilsook"></li>
https://www.celebheights.com/s/latest_1.html
https://www.celebheights.com/s/Sean-Kanan-52921.html
https://www.celebheights.com/s/Michael-Parks-52920.html
https://www.celebheights.com/s/Harlan-Drum-52919.html
https://www.celebheights.com/s/Patricia-Medina-52918.html
https://www.celebheights.com/s/Nan-Leslie-52917.html
https://www.celebheights.com/s/Don-Cornelius-52916.html
https://www.celebheights.com/s/Maria-Sten-52915.html
https://www.celebheights.com/s/Bruce-McGill-52914.html
https://www.celebheights.com/comments.html
https://www.celebheights.com/s/compare.php
https://www.celebheights.com/s/top50.html
https://www.celebheights.com/s/Justin-Bieber-47348.html
https://www.celebheights.com/s/Tom-Cruise-3.html
https://www.celebheights.com/s/Brad-Pitt-371.html
https://www.celebheights.com/s/Arnold-Schwarzenegger-177.html
https://www.celebheights.com/s/Sylvester-Stallone-347.html
https://www.celebheights.com/sneakers/
https://www.celebheights.com/a/23.html
https://www.celebheights.com/a/
https://www.celebheights.com/s/tagsA.html
CodePudding user response:
Instead of res.text, try res.content.