I am trying to use this python code to scrape chrome web store
from lxml import html
import requests
url = 'https://chrome.google.com/webstore/detail/cookie-editor/hlkenndednhfkekhgcdicdfddnkalmdm'
values = {'username': '[email protected]',
'password': 'mypassword'}
page = requests.get(url, data=values)
print(page)
tree = html.fromstring(page.content)
review = tree.xpath('//div[@]/text()')[0]
print(review)
however, I am getting Bad request 400. Is it even possible to scrape chrome web store?
CodePudding user response:
The webpage's contents are loaded by JavaScript. So you have to apply an automation tool something like Selenium to grab the right data.
Example:
from selenium import webdriver
import time
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
options = webdriver.ChromeOptions()
options.add_experimental_option("detach", True)
webdriver_service = Service("./chromedriver") #Your chromedriver path
driver = webdriver.Chrome(service=webdriver_service,options=options)
data = []
driver.get('https://chrome.google.com/webstore/detail/cookie-editor/hlkenndednhfkekhgcdicdfddnkalmdm')
driver.maximize_window()
time.sleep(3)
driver.find_element(By.XPATH,'//*[@ and contains(text(),"Review")]').click()
time.sleep(1)
soup = BeautifulSoup(driver.page_source,"html.parser")
data =[]
reviews = soup.select('div.ba-bc-Xb')
for review in reviews:
name = review.select_one('span[]').get_text(strip=True)
comment = review.select_one('div[]').get_text(strip=True)
data.append({
'name': name,
'comment': comment
})
print(data)
Oputput:
[{'name': 'PingPing But', 'comment': 'Love it..... so simple and easy to use !'}, {'name': 'Zhou Jeffrey', 'comment': "doesn't work anymore"}, {'name': 'eunice miralles', 'comment': 'same im trying to find a fix and in github they said it has a problem with permission but still not fixed'}, {'name': 'Jade Martinito', 'comment': 'me too'}, {'name': 'Bonafide Champ', 'comment': 'It works fine but it does this weird thing when I import cookies in incognito mode,
the cookies still get imported in the main browser windows.'}, {'name': 'Arman Nawaz World', 'comment': 'Easy to use this extension. it is very user friendly and simple interface, while other looks little complicated\nReview by ArmanxNawaz'}, {'name': 'Bagong Pook Elementary School', 'comment': 'Easy to use! Very helpful'}, {'name': 'Whitelisted', 'comment': 'Works great for development and resetting website cookies without digging through your settings'}, {'name': 'Rehxn Ali', 'comment': 'Best!! Saved Alot of Money With This Extention'}, {'name': 'biniyam demeke', 'comment': 'Oh, Very Helpful'}, {'name': 'Pingu VFX', 'comment': 'Easy to use while scamming kids on their roblox accountes'}, {'name': 'Abstractedjuice09 Z', 'comment': 'how?'}, {'name': 'jd', 'comment': 'lol same'}, {'name': 'Arnells Designs', 'comment': 'good'}, {'name': 'David Galbraith', 'comment': 'How is this called a cookie "editor"?? Not working at all. When I open it, the extension shows cookies for the page that I\'m currently on. It should be able to show cookies from every site I\'ve visited. And if I type ANYTHING in the search, nothing comes up. Not google, not Facebook, not steam, not one site that I have visited or logged into show up in the search bar. There is something very, very wrong. yeah, I can delete ALL cookies, but CCleaner does that just fine.'}, {'name': 'df fes', 'comment': 'Maybe you dont know how to use it?'}, {'name': 'Galih Kamulyan', 'comment': 'LEGENDARY'}, {'name': 'Aniket Chaudhary',
'comment': 'Liked it. But after using it for sometime, it shows an "unknown error".'}, {'name': 'Anonymous', 'comment': "mine doesn't work for first time too , it always show unknown error"}, {'name': 'Ehsan Abtahee', 'comment': 'did u find a fix?'}, {'name': 'kashba', 'comment': 'if you find a fix.. do tell me'}, {'name': 'Nischay2004 Muller', 'comment': 'The best easy cookie editor for all , strongly recommended'}, {'name': 'ultra noob', 'comment': 'Super simple and easy to use.'}, {'name': 'विकास कालीरामना', 'comment': 'Loved it!'}, {'name': 'Zachary Bolt', 'comment': 'Clean, easy to use and actively updated. 5 Stars well earned.'}, {'name': 'TALHA JUBAYER', 'comment': "Love it .it's
working"}, {'name': 'amrozain 2007', 'comment': 'good for hackers'}, {'name': 'Kazuko Masao', 'comment': 'Very good
.. Very good .. Very good.'}, {'name': 'chase Brigette', 'comment': 'This extention seems to be the culprit that makes bing my default browser!!! The extension was good before I realized this -_-"'}, {'name': 'Digital Audio Directions', 'comment': 'This is a joke right? Only seems to list cookies of the site you are on and all in a chopped up list format. NO search function for existing stored cookies? Search by keyword, date, etc, Does not seem available.'}, {'name': 'Phantom V', 'comment': 'This seems outdated.'}, {'name': 'Anonymous ZN49', 'comment': 'Easy to use this extension. it is very user friendly and simple interface, while other looks little complicated.'}, {'name': 'YongYi Wu', 'comment': "Who don't love cookies?"}, {'name': 'hush', 'comment': 'was working fine, now im getting an import error'}]