Home > Software engineering >  How to merge two lines of results in .csv file in Python (BeautifulSoup)
How to merge two lines of results in .csv file in Python (BeautifulSoup)

Time:03-11

I am trying to get data from one website, but I have difficulties on how to handle "Index is out of range" error or having results in two separate lines in .csv file. What I mean by the error "Index is out of range" is that it is possible on this site to have empty values on some records and I don't know how to put the correct condition in loop. I used some guides but it took me to nowhere.

my_url = uReq('website', context=ssl.create_default_context(cafile=certifi.where()))

uClient = my_url
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")

containers = page_soup.select('div.header__title, div.info__cta')
container = containers[0]
filename = "products.csv"
f = open(filename,"w")

headers="Product_Name, PriceWithVAT, PriceWithoutVAT, Stock\n"
f.write(headers)

for container in containers:
    
    productName = container.findAll("span", {"class":"sku"})
    name = productName[0].text if container.findAll("span", {"class":"sku"}) else "lack name"
    
    priceWithVAT = container.findAll("span", {"class":"price-intax"})
    price = priceWithVAT[0].text if container.findAll("span", {"class":"price-intax"}) else "lack price"
    
    priceWithoutVAT = container.findAll("span", {"class":"price-extax"})
    priceNot = priceWithoutVAT[0].text if container.findAll("span", {"class":"price-extax"}) else "lack price2"
    
    stock = container.findAll("p", {"class":"stock in-stock"})
    stock = stock[0].text if container.findAll("p", {"class":"stock in-stock"}) else "lack on stock"
    
    f.write(name   ","   price   ","   priceNot   ","   stock   "\n"   "\n")
    
f.close()

Then in the .csv file I got the results from entire page and every product is like divided to two lines like:

CORRECT,lack price,lack price2,lack on stock

lack name,CORRECT,CORRECT,CORRECT

My expected output:

CORRECT, CORRECT, CORRECT, CORRECT

(CORRECT means that correct data is scraped from the webiste)

When I delete if container.findAll("span", {"class":"sku"}) else "lack name" and similars from the loop it is showing me the Index out of range error, as it should have, cause there are some empty values.

Could you help me how to change the code?

CodePudding user response:

Need to slightly alter your logic here. What I would do is instead of getting each container as the product name and then the product info, grab the whole container that contains all the info. You'll notice that each product is in a <li> tag, under the <ul > tag.

So lets first grab the the <ul> tag that has a class that starts with 'products'. Then from there get all the <li> tags. Then we'll iterate through each of those and pull out the data needed.

As you stated, some of the tags aren't present, so we'll do a try/except. It'll try to get the data, if it fails, it'll default to the except exception.

Also, pandas is a really good and useful library to use/learn. So I went with that, as opposed to writing to csv file as you had it.

Code:

import requests
from bs4 import BeautifulSoup
import re

url = 'https://specjal.com/sklep/'
response = requests.get(url)

soup = BeautifulSoup(response.text, "html.parser")

products = soup.find('ul', {'class':re.compile('^products')}).find_all('li')


rows = []
for product in products:
    try:
        productName = product.find('span',{'class':'sku'}).text
    except:
        productName = 'lack name'
    
    try:
        priceWithVAT = product.find('span',{'class':'price-intax'}).text 
    except:
        priceWithVAT = 'lack price'
    
    try:
        priceWithoutVAT = product.find('span',{'class':'price-extax'}).text
    except:
        priceWithoutVAT = 'lack price2'
    
    try:
        stock = int(product.find('p',{'class':'stock in-stock'}).text.split()[0])
    except:
        stock = 'lack on stock'
        # consider changing the above line to stock = 0
    
    row = {
        'productName':productName, 
        'priceWithVAT':priceWithVAT, 
        'priceWithoutVAT':priceWithoutVAT, 
        'stock':stock}
    
    rows.append(row)
    
    
df = pd.DataFrame(rows)
df.to_csv('products.csv', index=False)

Output:

print(df)
             productName   priceWithVAT    priceWithoutVAT          stock
0        ZZ 90*105*4 VAY   14.86zł/szt.   12.08 zł bez VAT             10
1        ZZ 85*100*5 VAY   13.76zł/szt.   11.19 zł bez VAT             10
2         ZZ 80*95*4 VAY   12.66zł/szt.   10.29 zł bez VAT             20
3         ZZ 75*90*4 VAY   11.01zł/szt.    8.95 zł bez VAT             20
4         ZZ 70*85*4 VAY    9.91zł/szt.    8.06 zł bez VAT             20
5         ZZ 65*80*5 VAY    9.36zł/szt.    7.61 zł bez VAT             20
6         ZZ 65*80*4 VAY    9.36zł/szt.    7.61 zł bez VAT             20
7         ZZ 60*75*5 VAY    8.25zł/szt.    6.71 zł bez VAT             14
8         ZZ 55*65*4 VAY    7.71zł/szt.    6.27 zł bez VAT             10
9         ZZ 50*60*4 VAY    6.61zł/szt.    5.37 zł bez VAT             20
10        ZZ 45*55*4 VAY    6.05zł/szt.    4.92 zł bez VAT             20
11        ZZ 40*50*4 VAY    5.39zł/szt.    4.38 zł bez VAT             17
12        ZZ 35*45*4 VAY     4.8zł/szt.     3.9 zł bez VAT             30
13        ZZ 30*40*4 VAY    4.26zł/szt.    3.46 zł bez VAT             20
14            XPA 710 CT   39.61zł/szt.    32.2 zł bez VAT  lack on stock
15           UCP 202 KBF    19.7zł/szt.   16.02 zł bez VAT  lack on stock
16        U298/U291 SET9  188.04zł/szt.  152.88 zł bez VAT  lack on stock
17             U 64*80*8    11.8zł/szt.    9.59 zł bez VAT              2
18              U 6*10*3    2.51zł/szt.    2.04 zł bez VAT              4
19        U 45*53*10 RSB    7.55zł/szt.    6.14 zł bez VAT  lack on stock
20     U 30*40*7 K21 NBR       8zł/szt.     6.5 zł bez VAT              5
21      U 180*200*14 K50   37.74zł/szt.   30.68 zł bez VAT  lack on stock
22     U 16*24*5,5 NI300    8.56zł/szt.    6.96 zł bez VAT             13
23      U 140*160*14 K50   21.92zł/szt.   17.82 zł bez VAT  lack on stock
24      U 140*160*14 K23   23.71zł/szt.   19.28 zł bez VAT              3
25          TR16*4*540MM   38.27zł/szt.   31.11 zł bez VAT  lack on stock
26          TP 600 8M/20   156.7zł/szt.   127.4 zł bez VAT  lack on stock
27             TP 15*1,5   27.56zł/szt.   22.41 zł bez VAT  lack on stock
28           ST 3568 LFT   94.34zł/szt.    76.7 zł bez VAT  lack on stock
29           SC07A87CS32   47.32zł/szt.   38.47 zł bez VAT  lack on stock
30        SC04B19CS31PX2    46.3zł/szt.   37.64 zł bez VAT              3
31                 R28-9   96.05zł/szt.   78.09 zł bez VAT              2
32           R 2-6 ZZ SS   13.47zł/szt.   10.95 zł bez VAT  lack on stock
33         QJ 213 MPA C3  412.06zł/szt.  335.01 zł bez VAT  lack on stock
34               PJ 1219    5.97zł/szt.    4.85 zł bez VAT  lack on stock
35       OW1 115*94*8,1    15.72zł/szt.   12.78 zł bez VAT              2
36       OGNIWO 08B-3 CL    7.23zł/szt.    5.88 zł bez VAT              7
37      NU 2311 ETVP2 C3  408.34zł/szt.  331.98 zł bez VAT  lack on stock
38         NJ 2210 ET C4  195.19zł/szt.  158.69 zł bez VAT              4
39           NJ 209 ETVP  101.89zł/szt.   82.84 zł bez VAT              2
40           NA 4901 CZH   11.64zł/szt.    9.46 zł bez VAT  lack on stock
41          MR 16277 2RS      32zł/szt.   26.02 zł bez VAT              4
42        ŁAŃCUCH 08 B-3   76.38zł/szt.    62.1 zł bez VAT             20
43            KP 16 L100   33.86zł/szt.   27.53 zł bez VAT  lack on stock
44          K 81130 SRBF  132.45zł/szt.  107.68 zł bez VAT              2
45      JL 68145/111 NAF   17.59zł/szt.    14.3 zł bez VAT  lack on stock
46  HTF O 45-7 A G5 N C3     lack price        lack price2  lack on stock
47          HRC 35*45*45   37.08zł/szt.   30.15 zł bez VAT              6
48             HK 3520 B   22.39zł/szt.    18.2 zł bez VAT  lack on stock
49           HGY 15*21*1    0.74zł/szt.     0.6 zł bez VAT              8
  • Related