Home > other >  Bypassing EU consent request
Bypassing EU consent request

Time:01-03

I've been trying to extract data from google searches but I'm not able to bypass the "Before you continue to Google Search" consent form.

I tried to find a workoround and saw that others have suggested using the argument CONSENT=PENDING 999 , or something along the lines of CONSENT = YES HU.hu V10 B 256 in the get request. Unfortunately, I couldn't make the former work and in the latter case, I'm not entirely sure what the last three elements should be replaced with.

I attach a sample code below from here.

import requests
import bs4

headers = {'User-Agent':'Chrome 83 (Toshiba; Intel(R) Core(TM) i3-2367M CPU @ 1.40 GHz)'\
           'Windows 7 Home Premium',
           'Accept':'text/html,application/xhtml xml,application/xml;'\
           'q=0.9,image/webp,*/*;q=0.8',
           #'cookie': 'CONSENT = YES HU.hu V10 B 256' # what are the last three elements?  
           'cookie':'CONSENT=PENDING 999'
           }

text= "geeksforgeeks"
url = 'https://google.com/search?q='   text
  
request_result=requests.get( url , headers = headers) # here's where the trouble happens 

soup = bs4.BeautifulSoup(request_result.text, "html.parser")

print(soup) # not what one would expect

heading_object=soup.find_all( 'h3' ) 
  
for info in heading_object:
    print(info.getText())
    print("------")

Any help would be much appreciated.

CodePudding user response:

Yes, indeed Google uses the CONSENT cookie to determine whether the consent popup will show or not. I have played around with the cookie by adjusting the value and I can conclude as of writing, setting CONSENT cookie value to YES is enough to stop the consent window from showing.

In your code, you attempted to pass the cookie via the headers parameter. I recommend using the cookies parameter.

Adjust your code with this (and remove cookies from the headers):

request_result = requests.get( url, headers = headers, cookies = {'CONSENT' : 'YES '} )

My output after running your code with my solution:

GeeksforGeeks
------
GeeksforGeeks - YouTube
------
GeeksforGeeks | LinkedIn
------
GeeksforGeeks (@geeks_for_geeks) • Instagram photos and videos
------
GeeksforGeeks - Twitter
------
GeeksforGeeks - Home | Facebook
------
Geeks for Geeks - Crunchbase Company Profile & Funding
------
  • Related