Home > Back-end >  Downloading all tabs of a spreadsheet Google Drive API
Downloading all tabs of a spreadsheet Google Drive API

Time:07-16

I'm trying to download the full content of a spreadsheet using google Drive. Currently, my code is exporting and then writing to a file the content from the first tab from the given spreadsheet only. How can I make it download the full content of the file?

This is the function that I'm currently using:

def download_file(real_file_id, service):
    try:
        file_id = real_file_id
        request = service.files().export_media(fileId=file_id,
                                               mimeType='text/csv')
        file = io.BytesIO()
        downloader = MediaIoBaseDownload(file, request)
        done = False
        while done is False:
            status, done = downloader.next_chunk()
            print(F'Download {int(status.progress() * 100)}.')

    except HttpError as error:
        print(F'An error occurred: {error}')
        file = None


    file_object = open('test.csv', 'a')
    file_object.write(file.getvalue().decode("utf-8"))
    file_object.close()
    
    return file.getvalue()


I call the function at a later stage in my code by passing the already initialised google drive service and the file id

download_file(real_file_id='XXXXXXXXXXXXXXXXXXXXX', service=service)

CodePudding user response:

I believe your goal is as follows.

  • You want to download all sheets in a Google Spreadsheet as CSV data.
  • You want to achieve this using googleapis for python.

In this case, how about the following sample script? In this case, in order to retrieve the sheet names of each sheet in Google Spreadsheet, Sheets API is used. Using Sheets API, the sheet IDs of all sheets are retrieved. Using these sheet Ids, all sheets are downloaded as CSV data.

Sample script:

From your showing script, I guessed that service might be service = build("drive", "v3", credentials=creds). If my understanding is corret, in order to retrieve the acess token, please use creds.

spreadsheetId = "###" # Please set the Spreadsheet ID.

sheets = build("sheets", "v4", credentials=creds)

sheetObj = sheets.spreadsheets().get(spreadsheetId=spreadsheetId, fields="sheets(properties(sheetId,title))").execute()
accessToken = creds.token
for s in sheetObj.get("sheets", []):
    p = s["properties"]
    sheetName = p["title"]
    print("Download: "   sheetName)
    url = "https://docs.google.com/spreadsheets/export?id="   spreadsheetId   "&exportFormat=csv&gid="   str(p["sheetId"])
    res = requests.get(url, headers={"Authorization": "Bearer "   accessToken})
    with open(sheetName   ".csv", mode="wb") as f:
        f.write(res.content)
  • In this case, please add import requests.

  • When this script is run, all sheets in a Google Spreadsheet are downloaded as CSV data. The filename of each CSV file uses the tab name in Google Spreadsheet.

  • In this case, please add a scope of "https://www.googleapis.com/auth/spreadsheets.readonly" as follows. And, please reauthorize the scopes. Please be careful about this.

      SCOPES = [
          "https://www.googleapis.com/auth/drive.readonly", # Please use this for your actual situation.
          "https://www.googleapis.com/auth/spreadsheets.readonly",
      ]
    

Reference:

CodePudding user response:

Tanaike's answer is easier and more straightforward, but I already spent some time on this so I might as well post it as an alternative.

The problem you originally encountered is that CSV files do not support multiple tabs/sheets, so Drive's files.export will only export the first sheet, and it doesn't have a way to select specific sheets.

Another way you can approach this is to use the Sheets API copyTo() method to create temp files for each sheet and export those as single CSV files.

        # need a service for sheets and one for drive
        sheetservice = build('sheets', 'v4', credentials=creds) 
        driveservice = build('drive', 'v3', credentials=creds)

        spreadsheet = sheetservice.spreadsheets()
        result = spreadsheet.get(spreadsheetId=YOUR_SPREADSHEET).execute()
        sheets = result.get('sheets', []) # the list of sheets within your spreadsheet

        # standard metadata to create the blank spreadsheet files
        file_metadata = {
            "name":"temp",
            "mimeType":"application/vnd.google-apps.spreadsheet"
        } 

        for sheet in sheets:
        # create a blank spreadsheet and get its ID
            tempfile = driveservice.files().create(body=file_metadata).execute()
            tempid = tempfile.get('id')

            # copy the sheet to the new file
            sheetservice.spreadsheets().sheets().copyTo(spreadsheetId=YOUR_SPREADSHEET, sheetId=sheet['properties']['sheetId'], body={"destinationSpreadsheetId":tempid}).execute()
            
            # need to delete the first sheet since the copy gets added as second
            sheetservice.spreadsheets().batchUpdate(spreadsheetId=tempid, body={"requests":{"deleteSheet":{"sheetId":0}}}).execute()
            download_file(tempid, driveservice) # runs your original method to download the file
            driveservice.files().delete(fileId=tempid).execute() # to clean up the temp file

You'll also need the https://www.googleapis.com/auth/spreadsheets and https://www.googleapis.com/auth/drive scopes. This involves more API calls so I just recommend Tanaike's method, but I hope it gives you an idea of ways that you can play with the API to suit your needs.

  • Related