Home > Back-end >  Trying to do web scraping with Python, but it doesn't work well
Trying to do web scraping with Python, but it doesn't work well

Time:08-04

I'm doing a school project and I am trying to scrape data from websites. Basically I'm following a tutorial in edureka - https://www.edureka.co/blog/web-scraping-with-python/#demo

The sample code is like this

from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd

driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")

products=[] #List to store name of the product
prices=[] #List to store price of the product
ratings=[] #List to store rating of the product
driver.get("""<a href="https://www.flipkart.com/laptops/">https://www.flipkart.com/laptops/</a>~buyback-guarantee-on-laptops-/pr?sid=6bo,b5g&amp;amp;amp;amp;amp;amp;amp;amp;amp;uniq""")

content = driver.page_source
soup = BeautifulSoup(content)
for a in soup.findAll('a',href=True, attrs={'class':'_31qSD5'}):
    name=a.find('div', attrs={'class':'_3wU53n'})
    price=a.find('div', attrs={'class':'_1vC4OE _2rQ-NK'})
    rating=a.find('div', attrs={'class':'hGSR34 _2beYZw'})
    products.append(name.text)
    prices.append(price.text)
    ratings.append(rating.text) 

df = pd.DataFrame({'Product Name':products,'Price':prices,'Rating':ratings}) 
df.to_csv('products.csv', index=False, encoding='utf-8')

I simplly copied and pasted the sample code to Python to see how it works, and this is what I got

PS D:\COSC2625_Team_Blue> & C:/Users/meowg/AppData/Local/Programs/Python/Python310/python.exe d:/COSC2625_Team_Blue/test.py
d:\COSC2625_Team_Blue\test.py:5: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
  driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")
Traceback (most recent call last):
  File "C:\Users\meowg\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\common\service.py", line 71, in start
    self.process = subprocess.Popen(cmd, env=self.env,
  File "C:\Users\meowg\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 969, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Users\meowg\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 1438, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\COSC2625_Team_Blue\test.py", line 5, in <module>
    driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")
  File "C:\Users\meowg\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 69, in __init__
    super().__init__(DesiredCapabilities.CHROME['browserName'], "goog",
  File "C:\Users\meowg\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\chromium\webdriver.py", line 89, in __init__
    self.service.start()
  File "C:\Users\meowg\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\common\service.py", line 81, in start
    raise WebDriverException(
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://chromedriver.chromium.org/home

Does anyone know what went wrong? I have no idea what happened.

CodePudding user response:

The chances are the path of chromedriver.exe on your computer is different than that of edureka. Did you even download chromedriver.exe?

CodePudding user response:

Looks like you just didn't download the file that was included in the tutorial, by the location of /usr/lib/chromium-browser/chromedriver. We can't really help you here, you just have to download the chromedriver.

I would recommend you use python playwright instead of selenium, as it is just a more modern library, with a slightly smaller learning curve, in my opinion, but that's just a recommendation.

  • Related