I would like το create a function to read files from a shared Google Drive folder and concatanate them into one df. I would prefer to do it without using any authenticators if it would be possible.
I used this code i found here :
url = 'https://drive.google.com/drive/folders/1_Hwl0tm6rfechhTF-YGVsOA0cbzu4jUd'
path = 'https://drive.google.com/uc?export=download&id=' url.split('/')[-2]
df = pd.read_csv(path)
I want to read all files in the folder using glob and concatanate them in one df but i get the HTTPError: HTTP Error 404: Not Found , error. Any help would be apreciated
CodePudding user response:
IIUC use -1
, but also url
for me raise error:
path = 'https://drive.google.com/uc?export=download&id=' url.split('/')[-1]
CodePudding user response:
You cannot download a folder directly, the folders within the Drive API are considered as files, with the difference of the MIME type application/vnd.google-apps.folder
As the Drive API documentation says:
A container you can use to organize other types of files on Drive. Folders are files that only contain metadata, and have the MIME type
application/vnd.google-apps.folder
.Note: A single file stored on My Drive can be contained in multiple folders. A single file stored on a shared drive can only have one parent folder.
As a workaround, you can list all the files contained within a folder and download them one by one. To build the following example I have based on this:
do.py
def list_and_download():
service = drive_service()
folder_id = FOLDER_ID
# List all files within the folder
results = service.files().list(q="'{}' in parents".format(folder_id), includeItemsFromAllDrives= true, supportsAllDrive=true).execute()
items = results.get("files", [])
print(items)
fh = io.BytesIO()
for item in items:
# download file one by one using MediaIoBaseDownload
if item["mimeType"] != "text/csv":
return
request = service.files().get_media(fileId=item["id"])
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print("Download {}%.".format(int(status.progress() * 100)))
print("Download Complete!")
with open(item["name"], "wb") as f:
f.write(fh.read())
# Do whatever you want with the csv
Documentation
Documentation
CodePudding user response:
You should use Google-API to list your files in shared folder. https://developers.google.com/drive/api/v2/reference/children/list
Example usage of API to list files https://i.ibb.co/pyx8mKG/drive-list.png
After than if you get children list from json file you can read and concat dataframe
import pandas as pd
response = {
"kind": "drive#childList",
"etag": "\"9NuiSicPg_3yRScMQO3pipPxwvs\"",
"selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children",
"items": [
{
"kind": "drive#childReference",
"id": "1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1",
"selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children/1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1",
"childLink": "https://www.googleapis.com/drive/v2/files/1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1"
},
{
"kind": "drive#childReference",
"id": "14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO",
"selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children/14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO",
"childLink": "https://www.googleapis.com/drive/v2/files/14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO"
}
]
}
item_arr = []
for item in response["items"]:
print(item["id"])
download_url = 'https://drive.google.com/uc?id=' item["id"]
item_arr.append(pd.read_csv(download_url))
df = pd.concat(item_arr, axis=0)
print(df.head())