Home > Software design >  How to create the folder structure from info defined in a dataframe?
How to create the folder structure from info defined in a dataframe?

Time:05-10

I created a dataframe that lists all files and folders in my Google Drive. I've been able to filter it down to only the folders. I now want to recreate that same folder structure on my local computer.

So the dataframe has a unique folder id # for each folder, the name of the folder, and the id of each folder's parent folder, like this:

dataframe info

I suspect I'll need to use Python os to recursively create this structure. So far I've been able to create the first level of folders (in the top level) by specifying the parent id #:

parent_directory = 'test_dirs'
for row in df.itertuples():
    if getattr(row, 'parent_folder_id#') == '0':
        directory = getattr(row, 'folder_name')
        path = os.path.join(parent_directory, directory)
        os.mkdir(path)
    else:
        pass

But how would I make this recursive to be able to create all of the nested folders at once?

Edit

https://stackoverflow.com/a/72166494/14343826 helped me think about creating a dictionary from the dataframe where the keys are folder paths (made from the folder names) and the values are folder ids, like:

lookup = {'A': '1', 'B': '2', ..., 'A/A names/A john smith': '7'}

So it basically goes through each row of the dataframe, checks for the row's parent folder id in the dictionary, when it finds the parent folder it adds that name in front of the current folder's name and adds a new entry to the dictionary for that. That solves the nested folder creation.

Code:

lookup = {'Root': '0'}

for row in df.itertuples():
    folder_id = getattr(row, 'folder_id#')
    folder_name = getattr(row, 'folder_name')
    folder_parents = getattr(row, 'parent_folder_id#')
    path = list(lookup.keys())[list(lookup.values()).index(folder_parents)]   '/'   folder_name
    lookup[path] = folder_id
    os.makedirs(path)

CodePudding user response:

Just remember where you've been as you make them:

lookup = {'0': 'test_dirs' }
for row in df.itertuples():
    parent = getattr(row, 'parent_folder_id#')
    path = os.path.join( parent, directory )
    os.mkdir(path)
    lookup[getattr(row, 'folder_id#')] = path
  • Related