I've been trying to extract data from google searches but I'm not able to bypass the "Before you continue to Google Search" consent form.
I tried to find a workoround and saw that others have suggested using the argument CONSENT=PENDING 999
, or something along the lines of CONSENT = YES HU.hu V10 B 256
in the get request. Unfortunately, I couldn't make the former work and in the latter case, I'm not entirely sure what the last three elements should be replaced with.
I attach a sample code below from here.
import requests
import bs4
headers = {'User-Agent':'Chrome 83 (Toshiba; Intel(R) Core(TM) i3-2367M CPU @ 1.40 GHz)'\
'Windows 7 Home Premium',
'Accept':'text/html,application/xhtml xml,application/xml;'\
'q=0.9,image/webp,*/*;q=0.8',
#'cookie': 'CONSENT = YES HU.hu V10 B 256' # what are the last three elements?
'cookie':'CONSENT=PENDING 999'
}
text= "geeksforgeeks"
url = 'https://google.com/search?q=' text
request_result=requests.get( url , headers = headers) # here's where the trouble happens
soup = bs4.BeautifulSoup(request_result.text, "html.parser")
print(soup) # not what one would expect
heading_object=soup.find_all( 'h3' )
for info in heading_object:
print(info.getText())
print("------")
Any help would be much appreciated.
CodePudding user response:
Yes, indeed Google uses the CONSENT
cookie to determine whether the consent popup will show or not. I have played around with the cookie by adjusting the value and I can conclude as of writing, setting CONSENT
cookie value to YES
is enough to stop the consent window from showing.
In your code, you attempted to pass the cookie via the headers
parameter. I recommend using the cookies
parameter.
Adjust your code with this (and remove cookies from the headers):
request_result = requests.get( url, headers = headers, cookies = {'CONSENT' : 'YES '} )
My output after running your code with my solution:
GeeksforGeeks
------
GeeksforGeeks - YouTube
------
GeeksforGeeks | LinkedIn
------
GeeksforGeeks (@geeks_for_geeks) • Instagram photos and videos
------
GeeksforGeeks - Twitter
------
GeeksforGeeks - Home | Facebook
------
Geeks for Geeks - Crunchbase Company Profile & Funding
------