Home > database >  How can I filter a loop and "save for later" the result?
How can I filter a loop and "save for later" the result?

Time:03-08

I'm trying to scrape the amazon website for a project I'm working on.

Until now I've built this flow

driver = webdriver.Chrome(executable_path=r"C:\Users\chromedriver.exe")
driver.maximize_window()
wait=WebDriverWait(driver,20)
driver.get('hhttps://www.amazon.it/s?k=shoes&__mk_it_IT=ÅMÅŽÕÑ&crid=3B00FY4A5NJBZ&sprefix=shoes'
       ',aps,122&ref=nb_sb_noss')

products = driver.find_elements(By.CSS_SELECTOR, 'div[class = "sg-col-4-of-12 s-result-item s-asin sg-col-4-of-16 sg-col '
                                              's-widget-spacing-small sg-col-4-of-20"]')

for product in products:
    name = product.find_element(By.CSS_SELECTOR, 'span[class = "a-size-base-plus a-color-base a-text-normal"]').text
    brand = product.find_element(By.CSS_SELECTOR, 'span[class = "a-size-base-plus a-color-base"]').text
    price = product.find_element(By.CSS_SELECTOR, 'span[class = "a-price-whole"]').text

After generating this flow I want to filter the results by price (for example I want to maintain everything below 100€) and "save" the outputon a list/group to concatenate it with another loop results

Thanks Car

CodePudding user response:

Appending a dict for every product to a list would be one approache to hold your data for post processing:

data = []

for product in products:

    data.append({
        'name':product.find_element(By.CSS_SELECTOR, 'span[class = "a-size-base-plus a-color-base a-text-normal"]').text,
        'brand':product.find_element(By.CSS_SELECTOR, 'span[class = "a-size-base-plus a-color-base"]').text,
        'price':product.find_element(By.CSS_SELECTOR, 'span[class = "a-price-whole"]').text
    })

You can use this list of dicts to simply create a DataFrame for filtering and saving:

import pandas as pd

df = pd.DataFrame(data) #create dataframe
df['price'] = df['price'].str.replace(',','.').astype(float) #convert strings to float
df[df['price'] < 100].to_excel('test.xlsx', index=False) #filter dataframe and save to excel

Output

name brand price
0 Graceful-Get Connected, Sneaker Donna Skechers 40
1 Court Royale 2, Scarpe Uomo Nike 54.99
2 Og 85 Gold'n Gurl, Scarpe da Ginnastica Donna Skechers 54.93
3 Smash, Scarpe da Ginnastica. Uomo PUMA 28.43
4 Wearallday, Scarpe da corsa Donna, NULL, NULL Nike 55.48
5 Uomo Claudio A, Scarpe Stringate Basse Derby Geox 46.16
6 Court Graffik, Scarpa da Skate Bambini e Ragazzi DC Shoes 19.92
7 Hiking, Kids Shedir Mid Highing Shoes WP-Scarpe da Ginnastica Unisex-Bambini e Ragazzi CMP 33.68
8 Court Graffik, Scarpe da Ginnastica Basse Uomo DC Shoes 50.98
...
  • Related