Home > Mobile >  Is there a continue function in beautiful soup?
Is there a continue function in beautiful soup?

Time:07-01

from unittest import skip
import requests
from bs4 import BeautifulSoup
from csv import writer
import openpyxl

wb = openpyxl.load_workbook('Book3.xlsx')
ws = wb.active

with open('mtbs.csv', 'w', encoding='utf8', newline='') as f_output:
    csv_output = writer(f_output)
    header = ['Code','Product Description']
    csv_output.writerow(header)
    
    for row in ws.iter_rows(min_row=1, min_col=1, max_col=1, values_only=True):
        url = f"https://www.radwell.com/en-US/Buy/MITSUBISHI/MITSUBISHI/{row[0]}"
        print(url)
        req_page = requests.get(url)
        soup = BeautifulSoup(req_page.content, 'html.parser')
        div_techspec = soup.find('div', class_="minitabSection")

        if 'minitabSection' not in url:
            continue //does not work

        code = div_techspec.find_all('li')
        description1 = div_techspec.find_all('li')
        description2 = div_techspec.find_all('li')
        description3 = div_techspec.find_all('li')


        info = [code[0].text, description1[1].text, description2[2].text, description3[3].text]
        csv_output.writerow(info)

I am currently trying to collect data from a certain website. I have an excel sheet containing hundreds of product codes. However some product does not exist in the website that I am currently scrapping from and the loop stops running.

I am currently having issues for this part if 'minitabSection' not in url: continue

URL that does not exist should be skipped and continue running the rest of codes. How do I achieve this?

CodePudding user response:

Not sure if there is a string minitabSection in your url - While you try to find() the div_techspec you should check its result:

...
div_techspec = soup.find('div', class_="minitabSection")

if div_techspec is None:
    continue 
...

or other way around:

...
div_techspec = soup.find('div', class_="minitabSection")

if div_techspec:
    code = div_techspec.find_all('li')

    info = [code[0].text, code[1].text, code[2].text, code[3].text]
    csv_output.writerow(info)

else:
    continue 
...
  • Related