I am having very weird behavior with Selenium, my Python web scraping script was working previous week, but today it does not.
URL I am scraping is Google maps reviews in Lithuanian language:
driver.get('https://www.google.com/maps/place/Depo/@54.9388288,23.8977024,12z/data=!4m7!3m6!1s0x46dd9100a1414c23:0x3ab761cf0d216f62!8m2!3d54.7417742!4d25.225869!9m1!1b1')
Using BeautifulSoup(driver.page_source, 'html.parser')
raises me an error:
"InvalidArgumentException: Message: unexpected end of hex escape at line 1 column 721856"
The Error is being raised by "self.execute(Command.GET_PAGE_SOURCE)['value']"
, but what is interesting is that it raises the error only when I scroll down for more reviews. It does work if I do not scroll and scrape only several top comments using:
reviews = BeautifulSoup(driver.page_source, 'html.parser').find_all('div', class_ = 'jftiEf fontBodyMedium')
for result in reviews[:-1]:
print(result.find('span', class_='wiI7pd').text)
I am using the latest Selenium version ('4.4.3') and firefox webdriver. Anyone have any ideas what to do, how to scrape all reviews. Thank you.
CodePudding user response:
Your code is working fine on Chrome and you may check it on other browsers as well. Sometimes the issue may be caused by cache or cookies you may want to try removing them for a change. If not you should change the browser.