Read multiple csv from Shared Google drive folder using Python-CodePudding

I would like το create a function to read files from a shared Google Drive folder and concatanate them into one df. I would prefer to do it without using any authenticators if it would be possible.

I used this code i found here :

url = 'https://drive.google.com/drive/folders/1_Hwl0tm6rfechhTF-YGVsOA0cbzu4jUd'
path = 'https://drive.google.com/uc?export=download&id=' url.split('/')[-2]
df = pd.read_csv(path)

I want to read all files in the folder using glob and concatanate them in one df but i get the HTTPError: HTTP Error 404: Not Found , error. Any help would be apreciated

CodePudding user response：

IIUC use -1, but also url for me raise error:

path = 'https://drive.google.com/uc?export=download&id=' url.split('/')[-1]

CodePudding user response：

You cannot download a folder directly, the folders within the Drive API are considered as files, with the difference of the MIME type application/vnd.google-apps.folder

As the Drive API documentation says:

A container you can use to organize other types of files on Drive. Folders are files that only contain metadata, and have the MIME type application/vnd.google-apps.folder.

Note: A single file stored on My Drive can be contained in multiple folders. A single file stored on a shared drive can only have one parent folder.

As a workaround, you can list all the files contained within a folder and download them one by one. To build the following example I have based on this:

`do.py`

def list_and_download():
    service = drive_service()
    folder_id = FOLDER_ID
    # List all files within the folder
    results = service.files().list(q="'{}' in parents".format(folder_id), includeItemsFromAllDrives= true, supportsAllDrive=true).execute()
    items = results.get("files", [])
    print(items)
    fh = io.BytesIO()
    for item in items:
        # download file one by one using MediaIoBaseDownload
        if item["mimeType"] != "text/csv":
            return
        request = service.files().get_media(fileId=item["id"])
        fh = io.BytesIO()
        downloader = MediaIoBaseDownload(fh, request)
        done = False
        while done is False:
            status, done = downloader.next_chunk()
            print("Download {}%.".format(int(status.progress() * 100)))
        print("Download Complete!")
        with open(item["name"], "wb") as f:
            f.write(fh.read())

    # Do whatever you want with the csv

Documentation

MediaIOBaseDownload

CodePudding user response：

You should use Google-API to list your files in shared folder. https://developers.google.com/drive/api/v2/reference/children/list

Example usage of API to list files https://i.ibb.co/pyx8mKG/drive-list.png

After than if you get children list from json file you can read and concat dataframe



import pandas as pd

response = {
 "kind": "drive#childList",
 "etag": "\"9NuiSicPg_3yRScMQO3pipPxwvs\"",
 "selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children",
 "items": [
  {
   "kind": "drive#childReference",
   "id": "1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1",
   "selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children/1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1",
   "childLink": "https://www.googleapis.com/drive/v2/files/1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1"
  },
  {
   "kind": "drive#childReference",
   "id": "14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO",
   "selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children/14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO",
   "childLink": "https://www.googleapis.com/drive/v2/files/14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO"
  }
 ]
}

item_arr = []
for item in response["items"]:
    print(item["id"])
    download_url = 'https://drive.google.com/uc?id='   item["id"]
    item_arr.append(pd.read_csv(download_url))
df = pd.concat(item_arr, axis=0)
print(df.head())