I found a good solution, but it works on the number of questions and answers that Google gives by default, but for example I need more.
I am a novice developer on Python. How do I get more questions and answers? Do I have to implement a click first to disclose the required amount and then parse?
CodePudding user response:
The following code parse the questions appearing on screen, then asks if you want to parse more questions or not. If you enter y
then it clicks on the last question's button so that more are loaded in the page. The questions are stored in the list questions
, the answers in the list answers
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
your_path = '...'
driver = webdriver.Chrome(service=Service(your_path))
driver.get('https://www.google.com/search?q=How to make bakery?&source=hp&ei=j0aZYYjRAvja2roPrcWcyAU&iflsig=ALs-wAMAAAAAYZlUn4NMUPjfIpQmrXSmjIDnaWjJXWIJ&ved=0ahUKEwjI1JDn0Kf0AhV4rVYBHa0iB1kQ4dUDCAc&uact=5&oq=How to make bakery?&gs_lcp=Cgdnd3Mtd2l6EAMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBNQAFgAYJMDaABwAHgAgAF-iAF-kgEDMC4xmAEAoAECoAEB&sclient=gws-wiz')
questions, answers = [], []
while 1:
for idx,question in enumerate(driver.find_elements(By.CSS_SELECTOR, "div[id*='RELATED_QUESTION']")):
if idx >= len(questions): # skip already parsed questions
questions.append(question.text)
txt = ''
for answer in question.find_elements(By.CSS_SELECTOR, "div[id*='WEB_ANSWERS_RESULT']"):
txt = answer.get_attribute('innerText')
answers.append(txt)
inp = input(f'{idx 1} questions parsed, continue? (y/n)')
if inp == 'y':
question.click()
time.sleep(2)
else:
break