I would like some help to loop through some directories and subdirectories and extracting data. I have a directory with three levels, with the third level containing several .csv.gz files. The structure is like this
I need to access level 2 (where subfolders are) of each folder and check the existence of a specific folder (in my example, this will be subfolder 3; I left the other folders empty for this example, but in real cases they will have data). If checking returns True, then I want to change the name of files within the target subfolder3 and transfer all files to another folder.
Bellow is my code. It is quite cumbersome and there is probably better ways of doing it. I tried using os.walk() and this is the closest I got to a solution but it won't move the files.
import os
import shutil
def organizer(parent_dir, target_dir, destination_dir):
for root, dirs, files in os.walk(parent_dir):
if root.endswith(target_dir):
target = root
for files in os.listdir(target):
if not files.startswith("."):
# this is to change the name of the file
fullname = files.split(".")
just_name = fullname[0]
csv_extension = fullname[1]
gz_extension = fullname[2]
subject_id = target
#make a new name
origin = subject_id "/" just_name "." csv_extension "." gz_extension
#make a path based on this new name
new_name = os.path.join(destination_dir, origin)
#move file from origin folder to destination folder and rename the file
shutil.move(origin, new_name)
Any suggestions on how to make this work and / or more eficient?
CodePudding user response:
simply enough, you can use the built-in os module, with os.walk(path) returns you root directories and files found
import os
for root, _, files in os.walk(path):
#your code here
for your problem, do this
import os
for root, dirs, files in os.walk(parent_directory);
for file in files:
#exctract the data from the "file"
check this for more information os.walk()
and if you want to get the name of the file, you can use os.path.basename(path)
you can even check for just the gzipped csv files you're looking for using built-in fnmatch module
import fnmathch, os
def find_csv_files(path):
result = []
for root, _, files in os.walk(path):
for name in files:
if fnmatch.fnmatch(name, "*.csv.gz"): # find csv.gz using regex paterns
result.append(os.path.join(root, name))
return list(set(results)) #to get the unique paths if for some reason duplicated
CodePudding user response:
Ok, guys, I was finally able to find a solution. Here it is. Not the cleanest one, but it works in my case. Thanks for the help.
def organizer(parent_dir, target_dir, destination_dir):
for root, dirs, files in os.walk(parent_dir):
if root.endswith(target_dir):
target = root
for files in os.listdir(target):
#this one because I have several .DS store files in the folder which I don't want to extract
if not files.startswith("."):
fullname = files.split(".")
just_name = fullname[0]
csv_extension = fullname[1]
gz_extension = fullname[2]
origin = target "/" files
full_folder_name = origin.split("/")
#make a new name
new_name = full_folder_name[5] "_" just_name "." csv_extension "." gz_extension
#make a path based on this new name
new_path = os.path.join(destination_dir, new_name)
#move file from origin folder to destination folder and rename the file
shutil.move(origin, new_path)
The guess the problem was that was passing a variable that was a renamed file (in my example, I wrongly called this variable origin) as the origin path to shutil.move(). Since this path does not exist, then the files weren't moved.