I'm doing a school project and I am trying to scrape data from websites. Basically I'm following a tutorial in edureka - https://www.edureka.co/blog/web-scraping-with-python/#demo
The sample code is like this
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")
products=[] #List to store name of the product
prices=[] #List to store price of the product
ratings=[] #List to store rating of the product
driver.get("""<a href="https://www.flipkart.com/laptops/">https://www.flipkart.com/laptops/</a>~buyback-guarantee-on-laptops-/pr?sid=6bo,b5g&amp;amp;amp;amp;amp;amp;amp;amp;uniq""")
content = driver.page_source
soup = BeautifulSoup(content)
for a in soup.findAll('a',href=True, attrs={'class':'_31qSD5'}):
name=a.find('div', attrs={'class':'_3wU53n'})
price=a.find('div', attrs={'class':'_1vC4OE _2rQ-NK'})
rating=a.find('div', attrs={'class':'hGSR34 _2beYZw'})
products.append(name.text)
prices.append(price.text)
ratings.append(rating.text)
df = pd.DataFrame({'Product Name':products,'Price':prices,'Rating':ratings})
df.to_csv('products.csv', index=False, encoding='utf-8')
I simplly copied and pasted the sample code to Python to see how it works, and this is what I got
PS D:\COSC2625_Team_Blue> & C:/Users/meowg/AppData/Local/Programs/Python/Python310/python.exe d:/COSC2625_Team_Blue/test.py
d:\COSC2625_Team_Blue\test.py:5: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")
Traceback (most recent call last):
File "C:\Users\meowg\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\common\service.py", line 71, in start
self.process = subprocess.Popen(cmd, env=self.env,
File "C:\Users\meowg\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 969, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\meowg\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 1438, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "d:\COSC2625_Team_Blue\test.py", line 5, in <module>
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")
File "C:\Users\meowg\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 69, in __init__
super().__init__(DesiredCapabilities.CHROME['browserName'], "goog",
File "C:\Users\meowg\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\chromium\webdriver.py", line 89, in __init__
self.service.start()
File "C:\Users\meowg\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\common\service.py", line 81, in start
raise WebDriverException(
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://chromedriver.chromium.org/home
Does anyone know what went wrong? I have no idea what happened.
CodePudding user response:
The chances are the path of chromedriver.exe on your computer is different than that of edureka. Did you even download chromedriver.exe?
CodePudding user response:
Looks like you just didn't download the file that was included in the tutorial, by the location of /usr/lib/chromium-browser/chromedriver
. We can't really help you here, you just have to download the chromedriver.
I would recommend you use python playwright instead of selenium, as it is just a more modern library, with a slightly smaller learning curve, in my opinion, but that's just a recommendation.