Home > database >  How to Iterate over several directory levels and move files based on condition
How to Iterate over several directory levels and move files based on condition

Time:09-14

I would like some help to loop through some directories and subdirectories and extracting data. I have a directory with three levels, with the third level containing several .csv.gz files. The structure is like this

Directory structure example

I need to access level 2 (where subfolders are) of each folder and check the existence of a specific folder (in my example, this will be subfolder 3; I left the other folders empty for this example, but in real cases they will have data). If checking returns True, then I want to change the name of files within the target subfolder3 and transfer all files to another folder.

Bellow is my code. It is quite cumbersome and there is probably better ways of doing it. I tried using os.walk() and this is the closest I got to a solution but it won't move the files.

import os
import shutil
    
    
def organizer(parent_dir, target_dir, destination_dir):
    for root, dirs, files in os.walk(parent_dir):
        if root.endswith(target_dir):
            target = root
            for files in os.listdir(target):
                if not files.startswith("."):

                    # this is to change the name of the file
                    fullname = files.split(".")
                    just_name = fullname[0]
                    csv_extension = fullname[1]
                    gz_extension = fullname[2]
                    subject_id = target
                
                    #make a new name
                    origin = subject_id   "/"  just_name   "."   csv_extension   "."   gz_extension
                
                
                    #make a path based on this new name
                    new_name = os.path.join(destination_dir, origin)
                
                    #move file from origin folder to destination folder and rename the file
                    shutil.move(origin, new_name)

Any suggestions on how to make this work and / or more eficient?

CodePudding user response:

simply enough, you can use the built-in os module, with os.walk(path) returns you root directories and files found

import os

for root, _, files in os.walk(path):
    #your code here

for your problem, do this


import os
for root, dirs, files in os.walk(parent_directory);
    for file in files:
        #exctract the data from the "file"

check this for more information os.walk()

and if you want to get the name of the file, you can use os.path.basename(path)

you can even check for just the gzipped csv files you're looking for using built-in fnmatch module

import fnmathch, os
def find_csv_files(path):
    result = []
    for root, _, files in os.walk(path):
        for name in files:
            if fnmatch.fnmatch(name, "*.csv.gz"): # find csv.gz using regex paterns
                result.append(os.path.join(root, name))
    return list(set(results)) #to get the unique paths if for some reason duplicated

CodePudding user response:

Ok, guys, I was finally able to find a solution. Here it is. Not the cleanest one, but it works in my case. Thanks for the help.

def organizer(parent_dir, target_dir, destination_dir):
    for root, dirs, files in os.walk(parent_dir):
        if root.endswith(target_dir):
            target = root
            for files in os.listdir(target):
                #this one because I have several .DS store files in the folder which I don't want to extract
                if not files.startswith("."):
                    fullname = files.split(".")
                    just_name = fullname[0]
                    csv_extension = fullname[1]
                    gz_extension = fullname[2]
                    
                    origin = target   "/"   files
                    
                    full_folder_name = origin.split("/")

                    #make a new name
                    new_name = full_folder_name[5]   "_"  just_name   "."   csv_extension   "."   gz_extension
                    
                    #make a path based on this new name
                    new_path = os.path.join(destination_dir, new_name)
                    
                    #move file from origin folder to destination folder and rename the file
                    shutil.move(origin, new_path) 

The guess the problem was that was passing a variable that was a renamed file (in my example, I wrongly called this variable origin) as the origin path to shutil.move(). Since this path does not exist, then the files weren't moved.

  • Related