Home > Software design >  How to skip some iterations inside bs4 tags?
How to skip some iterations inside bs4 tags?

Time:11-07

I am trying to scrap data (thumbnail image link) from this site. The problem is, when it's getting the data it takes three

  • tags, but the onle first tag has that thumbnail link, other two are not important. So, I need to skip those two tags and move to the fourth
  • tag. For example,

    0 <li> #contains the thumbnail link
    1 <li> #should skip
    2 <li> #should skip
    3 <li> #contains the thumbnail link
    4 <li> #should skip
    5 <li> #should skip
    6 <li> #contains the thumbnail link
    

    like wise.

    Here is my code,

    from bs4 import BeautifulSoup
    import requests
    import openpyxl
    
    try:
        
        response = requests.get("https://robloxden.com/item-codes")
        soup = BeautifulSoup(response.text, 'html.parser')
        items = soup.find('ul', class_="masonry masonry--5 item-codes__container").find_all("li")
     
        for item in items:
            # product's icon link
            item_link = item.find('div', class_="image-card__graphic image-card__graphic--border-bottom").img
            item_link = item_link['data-src']
    
            print(item_link)
    
    except Exception as e:
        print(e)
    

    (just copy and run the above script, can explore the problem) this works only for first tag, and terminated with 'NoneType' object has no attribute 'img' this error, which means the second <li> tag does not contain the <img> tag. So need to skip the second and thrid <li> tags, the foruth <li> tag contains that <img> data. As bigginer to learn scrapping, please guide me to solve this problem.

  • CodePudding user response:

    Use enumerate and skip every 2 items and only continue (not to be confused with continue) with code for every 3rd item.

    for index, item in enumerate(items):
        if index % 3 != 0:
            continue
        
        ...
    

    If the issue is that .find returns None at some point you can use a try/except:

    for item in items:
        try:
            ...
        except AttributeError:
            pass
    
    • Related