Home > Net >  Retrieving data from web
Retrieving data from web

Time:05-24

I have this code:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
driver_BV= webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver_BV.get("https://www.qvc.com/handbags-and-luggage/handbags/clutches/_/N-1cknw/c.html?qq=mh")
product_BV=[]
price_BV=[]
elementHTML=driver_BV.find_element("class name", 'productInfoWrapper')
Final=[]
children_element=elementHTML.find_elements("class name", 'plContent')

print('''
      a. Retrieve data
      b. Create the graph
      c. Display the matrix
      d. Save to Excel file
      e. Exit
      ''')

while True:
    select_option_BV = input("Select option:")
    if select_option_BV == 'a':
        for child_element in children_element:
            title=child_element.find_element("class name", 'productDesc').get_attribute('innerText')
            product_BV.append(title)
            titlu=child_element.find_element("class name", 'priceSell')
            price=titlu.get_attribute('innerText')
            price_BV.append(price)
            print ('Products:', product_BV)
            print ('Prices:', price_BV)
            price=price.replace("$","")
            Final.append(float(price))
            Product_title_series=pd.Series(product_BV)
            Product_price_series=pd.Series(Final)
            product_rows={"Product name":Product_title_series, "Price":Product_price_series}
            Product_Matrix_Framework=pd.DataFrame(product_rows)
    elif select_option_BV == 'b':
        Product_Matrix_Framework.plot(x="Product name",y="Price")
    elif select_option_BV == 'c':
        print(Product_Matrix_Framework.sort_values("Price"))
    elif select_option_BV == 'd':
        Product_Matrix_Framework.to_excel("Products.xlsx")
    elif select_option_BV == 'e':
        print("CY@ exiting...")
        break

And I don't know what mistake I did but I can't make it work! I need it for a project for my university but I'm stuck on it right now, I don't know what I did wrong, when I'm writing "a" in console doesn't do anything and if I write any other letter says: "name 'Product_Matrix_Framework' is not defined" please help! Thank you.

CodePudding user response:

I think the elementHTML isn't necessary. You just children_element by searching for css selector.

...
driver_BV.get("https://www.qvc.com/handbags-and-luggage/handbags/clutches/_/N-1cknw/c.html?qq=mh")
product_BV=[]
price_BV=[]
Final=[]
children_element=driver_BV.find_elements_by_css_selector(".plContent .galleryItem")

It will find 60 items of goods. After that, the error keeps occurring and I think you need to fix those errors in for loop.

Hope it could help.

CodePudding user response:

After looking at some documentation sites and the website itself (I'm assuming you want the elements with CSS selector productDesc), I think I see what you want to do.

If you want to select an element by its CSS selector, (productDesc in this instance is a CSS selector) you should use:

title_elements = child_element.find_elements_by_css_selector("productDesc")

Which should return an array containing all the child elements with the CSS selector productDesc, and you can then iterate through that array to get the text of each element. Something like:

titles = []
for title_element in title_elements:
    titles.append(title_elements.get_attribute("innerHtml")

Looking at the website, each child_element may have one or more elements with the productDesc CSS selector, so you should store these in an array in case there are more than one. Your code appears to assume that there is only one.

  • Related