I am trying to scrap data (thumbnail image link) from this site. The problem is, when it's getting the data it takes three
0 <li> #contains the thumbnail link
1 <li> #should skip
2 <li> #should skip
3 <li> #contains the thumbnail link
4 <li> #should skip
5 <li> #should skip
6 <li> #contains the thumbnail link
like wise.
Here is my code,
from bs4 import BeautifulSoup
import requests
import openpyxl
try:
response = requests.get("https://robloxden.com/item-codes")
soup = BeautifulSoup(response.text, 'html.parser')
items = soup.find('ul', class_="masonry masonry--5 item-codes__container").find_all("li")
for item in items:
# product's icon link
item_link = item.find('div', class_="image-card__graphic image-card__graphic--border-bottom").img
item_link = item_link['data-src']
print(item_link)
except Exception as e:
print(e)
(just copy and run the above script, can explore the problem)
this works only for first tag, and terminated with 'NoneType' object has no attribute 'img'
this error, which means the second <li>
tag does not contain the <img>
tag. So need to skip the second and thrid <li>
tags, the foruth <li>
tag contains that <img>
data.
As bigginer to learn scrapping, please guide me to solve this problem.
CodePudding user response:
Use enumerate
and skip every 2 items and only continue (not to be confused with continue
) with code for every 3rd item.
for index, item in enumerate(items):
if index % 3 != 0:
continue
...
If the issue is that .find
returns None
at some point you can use a try/except
:
for item in items:
try:
...
except AttributeError:
pass