I am trying to get data from one website, but I have difficulties on how to handle "Index is out of range" error or having results in two separate lines in .csv file. What I mean by the error "Index is out of range" is that it is possible on this site to have empty values on some records and I don't know how to put the correct condition in loop. I used some guides but it took me to nowhere.
my_url = uReq('website', context=ssl.create_default_context(cafile=certifi.where()))
uClient = my_url
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.select('div.header__title, div.info__cta')
container = containers[0]
filename = "products.csv"
f = open(filename,"w")
headers="Product_Name, PriceWithVAT, PriceWithoutVAT, Stock\n"
f.write(headers)
for container in containers:
productName = container.findAll("span", {"class":"sku"})
name = productName[0].text if container.findAll("span", {"class":"sku"}) else "lack name"
priceWithVAT = container.findAll("span", {"class":"price-intax"})
price = priceWithVAT[0].text if container.findAll("span", {"class":"price-intax"}) else "lack price"
priceWithoutVAT = container.findAll("span", {"class":"price-extax"})
priceNot = priceWithoutVAT[0].text if container.findAll("span", {"class":"price-extax"}) else "lack price2"
stock = container.findAll("p", {"class":"stock in-stock"})
stock = stock[0].text if container.findAll("p", {"class":"stock in-stock"}) else "lack on stock"
f.write(name "," price "," priceNot "," stock "\n" "\n")
f.close()
Then in the .csv file I got the results from entire page and every product is like divided to two lines like:
CORRECT,lack price,lack price2,lack on stock
lack name,CORRECT,CORRECT,CORRECT
My expected output:
CORRECT, CORRECT, CORRECT, CORRECT
(CORRECT means that correct data is scraped from the webiste)
When I delete
if container.findAll("span", {"class":"sku"}) else "lack name"
and similars from the loop it is showing me the Index out of range error, as it should have, cause there are some empty values.
Could you help me how to change the code?
CodePudding user response:
Need to slightly alter your logic here. What I would do is instead of getting each container
as the product name and then the product info, grab the whole container that contains all the info. You'll notice that each product is in a <li>
tag, under the <ul >
tag.
So lets first grab the the <ul>
tag that has a class that starts with 'products'
. Then from there get all the <li>
tags. Then we'll iterate through each of those and pull out the data needed.
As you stated, some of the tags aren't present, so we'll do a try/except
. It'll try to get the data, if it fails, it'll default to the except
exception.
Also, pandas
is a really good and useful library to use/learn. So I went with that, as opposed to writing to csv file as you had it.
Code:
import requests
from bs4 import BeautifulSoup
import re
url = 'https://specjal.com/sklep/'
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
products = soup.find('ul', {'class':re.compile('^products')}).find_all('li')
rows = []
for product in products:
try:
productName = product.find('span',{'class':'sku'}).text
except:
productName = 'lack name'
try:
priceWithVAT = product.find('span',{'class':'price-intax'}).text
except:
priceWithVAT = 'lack price'
try:
priceWithoutVAT = product.find('span',{'class':'price-extax'}).text
except:
priceWithoutVAT = 'lack price2'
try:
stock = int(product.find('p',{'class':'stock in-stock'}).text.split()[0])
except:
stock = 'lack on stock'
# consider changing the above line to stock = 0
row = {
'productName':productName,
'priceWithVAT':priceWithVAT,
'priceWithoutVAT':priceWithoutVAT,
'stock':stock}
rows.append(row)
df = pd.DataFrame(rows)
df.to_csv('products.csv', index=False)
Output:
print(df)
productName priceWithVAT priceWithoutVAT stock
0 ZZ 90*105*4 VAY 14.86zł/szt. 12.08 zł bez VAT 10
1 ZZ 85*100*5 VAY 13.76zł/szt. 11.19 zł bez VAT 10
2 ZZ 80*95*4 VAY 12.66zł/szt. 10.29 zł bez VAT 20
3 ZZ 75*90*4 VAY 11.01zł/szt. 8.95 zł bez VAT 20
4 ZZ 70*85*4 VAY 9.91zł/szt. 8.06 zł bez VAT 20
5 ZZ 65*80*5 VAY 9.36zł/szt. 7.61 zł bez VAT 20
6 ZZ 65*80*4 VAY 9.36zł/szt. 7.61 zł bez VAT 20
7 ZZ 60*75*5 VAY 8.25zł/szt. 6.71 zł bez VAT 14
8 ZZ 55*65*4 VAY 7.71zł/szt. 6.27 zł bez VAT 10
9 ZZ 50*60*4 VAY 6.61zł/szt. 5.37 zł bez VAT 20
10 ZZ 45*55*4 VAY 6.05zł/szt. 4.92 zł bez VAT 20
11 ZZ 40*50*4 VAY 5.39zł/szt. 4.38 zł bez VAT 17
12 ZZ 35*45*4 VAY 4.8zł/szt. 3.9 zł bez VAT 30
13 ZZ 30*40*4 VAY 4.26zł/szt. 3.46 zł bez VAT 20
14 XPA 710 CT 39.61zł/szt. 32.2 zł bez VAT lack on stock
15 UCP 202 KBF 19.7zł/szt. 16.02 zł bez VAT lack on stock
16 U298/U291 SET9 188.04zł/szt. 152.88 zł bez VAT lack on stock
17 U 64*80*8 11.8zł/szt. 9.59 zł bez VAT 2
18 U 6*10*3 2.51zł/szt. 2.04 zł bez VAT 4
19 U 45*53*10 RSB 7.55zł/szt. 6.14 zł bez VAT lack on stock
20 U 30*40*7 K21 NBR 8zł/szt. 6.5 zł bez VAT 5
21 U 180*200*14 K50 37.74zł/szt. 30.68 zł bez VAT lack on stock
22 U 16*24*5,5 NI300 8.56zł/szt. 6.96 zł bez VAT 13
23 U 140*160*14 K50 21.92zł/szt. 17.82 zł bez VAT lack on stock
24 U 140*160*14 K23 23.71zł/szt. 19.28 zł bez VAT 3
25 TR16*4*540MM 38.27zł/szt. 31.11 zł bez VAT lack on stock
26 TP 600 8M/20 156.7zł/szt. 127.4 zł bez VAT lack on stock
27 TP 15*1,5 27.56zł/szt. 22.41 zł bez VAT lack on stock
28 ST 3568 LFT 94.34zł/szt. 76.7 zł bez VAT lack on stock
29 SC07A87CS32 47.32zł/szt. 38.47 zł bez VAT lack on stock
30 SC04B19CS31PX2 46.3zł/szt. 37.64 zł bez VAT 3
31 R28-9 96.05zł/szt. 78.09 zł bez VAT 2
32 R 2-6 ZZ SS 13.47zł/szt. 10.95 zł bez VAT lack on stock
33 QJ 213 MPA C3 412.06zł/szt. 335.01 zł bez VAT lack on stock
34 PJ 1219 5.97zł/szt. 4.85 zł bez VAT lack on stock
35 OW1 115*94*8,1 15.72zł/szt. 12.78 zł bez VAT 2
36 OGNIWO 08B-3 CL 7.23zł/szt. 5.88 zł bez VAT 7
37 NU 2311 ETVP2 C3 408.34zł/szt. 331.98 zł bez VAT lack on stock
38 NJ 2210 ET C4 195.19zł/szt. 158.69 zł bez VAT 4
39 NJ 209 ETVP 101.89zł/szt. 82.84 zł bez VAT 2
40 NA 4901 CZH 11.64zł/szt. 9.46 zł bez VAT lack on stock
41 MR 16277 2RS 32zł/szt. 26.02 zł bez VAT 4
42 ŁAŃCUCH 08 B-3 76.38zł/szt. 62.1 zł bez VAT 20
43 KP 16 L100 33.86zł/szt. 27.53 zł bez VAT lack on stock
44 K 81130 SRBF 132.45zł/szt. 107.68 zł bez VAT 2
45 JL 68145/111 NAF 17.59zł/szt. 14.3 zł bez VAT lack on stock
46 HTF O 45-7 A G5 N C3 lack price lack price2 lack on stock
47 HRC 35*45*45 37.08zł/szt. 30.15 zł bez VAT 6
48 HK 3520 B 22.39zł/szt. 18.2 zł bez VAT lack on stock
49 HGY 15*21*1 0.74zł/szt. 0.6 zł bez VAT 8