from unittest import skip
import requests
from bs4 import BeautifulSoup
from csv import writer
import openpyxl
wb = openpyxl.load_workbook('Book3.xlsx')
ws = wb.active
with open('mtbs.csv', 'w', encoding='utf8', newline='') as f_output:
csv_output = writer(f_output)
header = ['Code','Product Description']
csv_output.writerow(header)
for row in ws.iter_rows(min_row=1, min_col=1, max_col=1, values_only=True):
url = f"https://www.radwell.com/en-US/Buy/MITSUBISHI/MITSUBISHI/{row[0]}"
print(url)
req_page = requests.get(url)
soup = BeautifulSoup(req_page.content, 'html.parser')
div_techspec = soup.find('div', class_="minitabSection")
if 'minitabSection' not in url:
continue //does not work
code = div_techspec.find_all('li')
description1 = div_techspec.find_all('li')
description2 = div_techspec.find_all('li')
description3 = div_techspec.find_all('li')
info = [code[0].text, description1[1].text, description2[2].text, description3[3].text]
csv_output.writerow(info)
I am currently trying to collect data from a certain website. I have an excel sheet containing hundreds of product codes. However some product does not exist in the website that I am currently scrapping from and the loop stops running.
I am currently having issues for this part if 'minitabSection' not in url: continue
URL that does not exist should be skipped and continue running the rest of codes. How do I achieve this?
CodePudding user response:
Not sure if there is a string minitabSection
in your url - While you try to find()
the div_techspec
you should check its result:
...
div_techspec = soup.find('div', class_="minitabSection")
if div_techspec is None:
continue
...
or other way around:
...
div_techspec = soup.find('div', class_="minitabSection")
if div_techspec:
code = div_techspec.find_all('li')
info = [code[0].text, code[1].text, code[2].text, code[3].text]
csv_output.writerow(info)
else:
continue
...