Home > Enterprise >  Are there any other ways to connect to Google Sheets in Airflow?
Are there any other ways to connect to Google Sheets in Airflow?

Time:02-16

I'm trying to connect to Google Sheets in Airflow with Python Operator as follows

import pandas as pd
import pygsheets
from google.oauth2 import service_account
from airflow.operators.python import PythonOperator

def estblsh_conn_to_gs():

    creds = service_account.Credentials.from_service_account_file(
        'service_account_json_file',
        scopes=('google_api_spreadsheets_auth_link', 'google_api_gdrive_auth_link'),
        subject='client_mail'
    )

    pg = pygsheets.authorize(custom_credentials=creds)
    return pg

def get_data_from_spreadsheet(spreadsheet_link, worksheet_title):

    pg = establish_conn_to_gs()
    doc = pg.open_by_url('spreadsheet_link')
    data = doc.worksheet_by_title('worksheet_name').get_all_values(include_tailing_empty_rows=False)
    return data

get_data_from_gs = PythonOperator(
    task_id = 'get_data_from_gs',
    python_callable = get_data_from_spreadsheet(link, title)
)

This works fine but maybe there are any alternatives to do the same? I've found Google Sheets Operator but current tech doc is not good(

Thanks for help!

CodePudding user response:

Airflow has GSheetsHook which Interact with Google Sheets via Google Cloud connection (If you don't have connection defined you can follow this doc)

To get data from Google Sheet simply use the hook. There is no need to implement it on your own - if the functionality is not exactly what you need then you can inherit from the hook and enhance it.

To get values you can use:

get_values - Gets values from Google Sheet from a single range (API)

batch_get_values - Gets values from Google Sheet from a list of ranges (API)

Example:

from airflow.providers.google.suite.hooks.sheets import GSheetsHook
from airflow.operators.python import PythonOperator

def get_data_from_spreadsheet():
    hook = GSheetsHook(
        gcp_conn_id="google_conn_id",
    )
    spreadsheet = hook.get_values(spreadsheet='name', range='my-range' )
   #spreadsheet is list of values from your spreadsheet.
   #add the rest of your code here.


get_data_from_gs = PythonOperator(
    task_id = 'get_data_from_gs',
    python_callable = get_data_from_spreadsheet(link, title)
)
  • Related