Home > Back-end >  Get page content with "login with google" authentication using python
Get page content with "login with google" authentication using python

Time:12-25

I am wondering if it's generally possible. I have to get content of some page with specific URL utilizing python3.x. When I put URL into the browser, I have only one option: "Continue with google". I give my google account credentials and then desired page appears. I need his content. Is it possible to do it with python?

CodePudding user response:

Your Python app needs to ask for authentication and authorization to Google's API through the OpenID Connect protocol, which is a "a simple identity layer on top of the OAuth 2.0 protocol" that uses bearer tokens, and Google being in this case the identity provider.

Once the user had been signed-in to Google through your app, then the third-party app will trust the bearer tokens from Google, and will automatically consider the user signed-in.

The process is quite articulated (it includes security measures etc), but basically it's the same for every OIDC-based system, i.e. eventually you'll get:

  • an ID token, which proves your authentication
  • an access token, which authorizes your app to access resources on your behalf

You can find ready-to-use Python snippets on the Google's reference.

I would avoid other automated ways, in this case.

CodePudding user response:

You are looking for Selenium or other WebDriver based technology. It allows you to spawn and programmatically control a real browser. That have lots of advantages but you likely have to analyze a messy HTML in order to extract the content you need.

@Jonathan Ciapetti approach might be (much) simpler and robust in your case. Get OpenID credentials, then query the page or hit the API your need instead of trying to mimic human interactions. But if the page is so complex that only a real browser can display it (and there is no API available), use a real browser controlled by Selenium.

Selenium have Python bindings, WebDriver is a JSON based W3C API specification that Selenium wrap.

CodePudding user response:

Yes, it is generally possible to retrieve the content of a webpage using Python. One way to do this is by using the requests library, which allows you to send HTTP requests and retrieve the response from a server.

Here is an example of how you might use the requests library to retrieve the content of a webpage that requires you to log in with your Google account:

import requests

# Set the URL of the webpage you want to access
url = "https://www.example.com/page"

# Set your Google account credentials
username = "your_username"
password = "your_password"

# Send a POST request to the login form to log in with your credentials
login_response = requests.post(url, data={'username': username, 'password': password})

# If the login was successful, the response should contain the desired webpage content
if login_response.status_code == 200:
  webpage_content = login_response.text
  print(webpage_content)

Keep in mind that this is just one example of how you might approach this problem, and the exact details of how to retrieve the content of a webpage using Python will depend on the specific requirements of the webpage and the login process.

It is also important to note that web scraping can sometimes be considered a violation of a website's terms of service, and you should make sure to respect any relevant terms or policies when accessing and using web content.

CodePudding user response:

Your Python app needs to ask for authentication and authorization to Google's API through the OpenID Connect protocol, which is a "a simple identity layer on top of the OAuth 2.0 protocol" that uses bearer tokens, and Google being in this case the identity provider.

Once the user had been signed-in to Google through your app, then the third-party app will trust the bearer tokens from Google, and will automatically consider the user signed-in.

The process is quite articulated (it includes security measures etc), but basically it's the same for every OIDC-based system, i.e. eventually you'll get:

an ID token, which proves your authentication an access token, which authorizes your app to access resources on your behalf You can find ready-to-use Python snippets on the Google's reference.

I would avoid other automated ways, in this case.

  • Related