I'm trying to scrape the amazon website for a project I'm working on.
Until now I've built this flow
driver = webdriver.Chrome(executable_path=r"C:\Users\chromedriver.exe")
driver.maximize_window()
wait=WebDriverWait(driver,20)
driver.get('hhttps://www.amazon.it/s?k=shoes&__mk_it_IT=ÅMÅŽÕÑ&crid=3B00FY4A5NJBZ&sprefix=shoes'
',aps,122&ref=nb_sb_noss')
products = driver.find_elements(By.CSS_SELECTOR, 'div[class = "sg-col-4-of-12 s-result-item s-asin sg-col-4-of-16 sg-col '
's-widget-spacing-small sg-col-4-of-20"]')
for product in products:
name = product.find_element(By.CSS_SELECTOR, 'span[class = "a-size-base-plus a-color-base a-text-normal"]').text
brand = product.find_element(By.CSS_SELECTOR, 'span[class = "a-size-base-plus a-color-base"]').text
price = product.find_element(By.CSS_SELECTOR, 'span[class = "a-price-whole"]').text
After generating this flow I want to filter the results by price (for example I want to maintain everything below 100€) and "save" the outputon a list/group to concatenate it with another loop results
Thanks Car
CodePudding user response:
Appending a dict
for every product to a list
would be one approache to hold your data for post processing:
data = []
for product in products:
data.append({
'name':product.find_element(By.CSS_SELECTOR, 'span[class = "a-size-base-plus a-color-base a-text-normal"]').text,
'brand':product.find_element(By.CSS_SELECTOR, 'span[class = "a-size-base-plus a-color-base"]').text,
'price':product.find_element(By.CSS_SELECTOR, 'span[class = "a-price-whole"]').text
})
You can use this list of dicts to simply create a DataFrame
for filtering and saving:
import pandas as pd
df = pd.DataFrame(data) #create dataframe
df['price'] = df['price'].str.replace(',','.').astype(float) #convert strings to float
df[df['price'] < 100].to_excel('test.xlsx', index=False) #filter dataframe and save to excel
Output
name | brand | price | |
---|---|---|---|
0 | Graceful-Get Connected, Sneaker Donna | Skechers | 40 |
1 | Court Royale 2, Scarpe Uomo | Nike | 54.99 |
2 | Og 85 Gold'n Gurl, Scarpe da Ginnastica Donna | Skechers | 54.93 |
3 | Smash, Scarpe da Ginnastica. Uomo | PUMA | 28.43 |
4 | Wearallday, Scarpe da corsa Donna, NULL, NULL | Nike | 55.48 |
5 | Uomo Claudio A, Scarpe Stringate Basse Derby | Geox | 46.16 |
6 | Court Graffik, Scarpa da Skate Bambini e Ragazzi | DC Shoes | 19.92 |
7 | Hiking, Kids Shedir Mid Highing Shoes WP-Scarpe da Ginnastica Unisex-Bambini e Ragazzi | CMP | 33.68 |
8 | Court Graffik, Scarpe da Ginnastica Basse Uomo | DC Shoes | 50.98 |
... |