I'm trying to download the full content of a spreadsheet using google Drive. Currently, my code is exporting and then writing to a file the content from the first tab from the given spreadsheet only. How can I make it download the full content of the file?
This is the function that I'm currently using:
def download_file(real_file_id, service):
try:
file_id = real_file_id
request = service.files().export_media(fileId=file_id,
mimeType='text/csv')
file = io.BytesIO()
downloader = MediaIoBaseDownload(file, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print(F'Download {int(status.progress() * 100)}.')
except HttpError as error:
print(F'An error occurred: {error}')
file = None
file_object = open('test.csv', 'a')
file_object.write(file.getvalue().decode("utf-8"))
file_object.close()
return file.getvalue()
I call the function at a later stage in my code by passing the already initialised google drive service and the file id
download_file(real_file_id='XXXXXXXXXXXXXXXXXXXXX', service=service)
CodePudding user response:
I believe your goal is as follows.
- You want to download all sheets in a Google Spreadsheet as CSV data.
- You want to achieve this using googleapis for python.
In this case, how about the following sample script? In this case, in order to retrieve the sheet names of each sheet in Google Spreadsheet, Sheets API is used. Using Sheets API, the sheet IDs of all sheets are retrieved. Using these sheet Ids, all sheets are downloaded as CSV data.
Sample script:
From your showing script, I guessed that service
might be service = build("drive", "v3", credentials=creds)
. If my understanding is corret, in order to retrieve the acess token, please use creds
.
spreadsheetId = "###" # Please set the Spreadsheet ID.
sheets = build("sheets", "v4", credentials=creds)
sheetObj = sheets.spreadsheets().get(spreadsheetId=spreadsheetId, fields="sheets(properties(sheetId,title))").execute()
accessToken = creds.token
for s in sheetObj.get("sheets", []):
p = s["properties"]
sheetName = p["title"]
print("Download: " sheetName)
url = "https://docs.google.com/spreadsheets/export?id=" spreadsheetId "&exportFormat=csv&gid=" str(p["sheetId"])
res = requests.get(url, headers={"Authorization": "Bearer " accessToken})
with open(sheetName ".csv", mode="wb") as f:
f.write(res.content)
In this case, please add
import requests
.When this script is run, all sheets in a Google Spreadsheet are downloaded as CSV data. The filename of each CSV file uses the tab name in Google Spreadsheet.
In this case, please add a scope of "https://www.googleapis.com/auth/spreadsheets.readonly" as follows. And, please reauthorize the scopes. Please be careful about this.
SCOPES = [ "https://www.googleapis.com/auth/drive.readonly", # Please use this for your actual situation. "https://www.googleapis.com/auth/spreadsheets.readonly", ]
Reference:
CodePudding user response:
Tanaike's answer is easier and more straightforward, but I already spent some time on this so I might as well post it as an alternative.
The problem you originally encountered is that CSV files do not support multiple tabs/sheets, so Drive's files.export
will only export the first sheet, and it doesn't have a way to select specific sheets.
Another way you can approach this is to use the Sheets API copyTo()
method to create temp files for each sheet and export those as single CSV files.
# need a service for sheets and one for drive
sheetservice = build('sheets', 'v4', credentials=creds)
driveservice = build('drive', 'v3', credentials=creds)
spreadsheet = sheetservice.spreadsheets()
result = spreadsheet.get(spreadsheetId=YOUR_SPREADSHEET).execute()
sheets = result.get('sheets', []) # the list of sheets within your spreadsheet
# standard metadata to create the blank spreadsheet files
file_metadata = {
"name":"temp",
"mimeType":"application/vnd.google-apps.spreadsheet"
}
for sheet in sheets:
# create a blank spreadsheet and get its ID
tempfile = driveservice.files().create(body=file_metadata).execute()
tempid = tempfile.get('id')
# copy the sheet to the new file
sheetservice.spreadsheets().sheets().copyTo(spreadsheetId=YOUR_SPREADSHEET, sheetId=sheet['properties']['sheetId'], body={"destinationSpreadsheetId":tempid}).execute()
# need to delete the first sheet since the copy gets added as second
sheetservice.spreadsheets().batchUpdate(spreadsheetId=tempid, body={"requests":{"deleteSheet":{"sheetId":0}}}).execute()
download_file(tempid, driveservice) # runs your original method to download the file
driveservice.files().delete(fileId=tempid).execute() # to clean up the temp file
You'll also need the https://www.googleapis.com/auth/spreadsheets
and https://www.googleapis.com/auth/drive
scopes. This involves more API calls so I just recommend Tanaike's method, but I hope it gives you an idea of ways that you can play with the API to suit your needs.