Home > Blockchain >  how iterate through a folder by subfolder in python
how iterate through a folder by subfolder in python

Time:05-04

I have a folder filled with thousands of .ai files. The way this folder is arranged is it initially has subfolders titled with a customer's name, and within each of those subfolders is a unique directory of some or many subfolders that contain .ai's, or subfolders within subfolders within subs... that contain .ai's, or no subfolders just .ai files.

I need a program that will iterate through this folder by taking every .ai filename inside of a customer subfolder(regardless of how many subfolders, or subs within subs, etc...) and append it to a list. Then I'll take that list and do some ocr stuff to it later, but once that's done I'll clear the list and move on to the next subfolder.

This is the code I used to attempt this but it failed. It returns an empty list sometimes, or a list with just one filename in it, when it should return a list each time with one or multiple .ai filenames inside of it.

def folder_loop(folder):
    temp_list = []
    for root, dirs, files in os.walk(folder):
        for dir in dirs:
            for file in dir:
                if file.endswith("ai"):
                    temp_list.append(os.path.join(root, file))
        print(temp_list)
        temp_list.clear()

I'm a begginer and I hardly understand what the code is even doing so I'm not surprised it didn't work. Any ideas?

CodePudding user response:

You could try the following:

In case you want to give the function the base folder, in which all the customer folders are located, and then want for each of the customer folders a list of all .ai-files (from every sublevel):

from pathlib import Path

def folder_loop(folder):
    for path in Path(folder).iterdir():
        if path.is_dir():
            yield list(path.rglob("*.ai"))

Path.rglob("*.ai") is recursively globbing the the given Path with all its subfolders for .ai-files.

To use it:

the_folder = "..."
for file_list in folder_loop(the_folder):
    print(file_list)
    # do whatever you want to do with the files

If you want to give it a folder and want one list with all the .ai files in it:

def folder_loop(folder):
    return list(Path(folder).rglob("*.ai"))

The yielded/returned lists here contain Path-objects (which are pretty handy). If you want strings instead, then you could do

       ....
            yield list(map(str, path.rglob("*.ai")))

etc.

CodePudding user response:

There's a community post here which has some stupidly complete answers.

That being said, I have the method below in my personal utilities toolbox.

def get_files_from_path(path: str=".", ext=None) -> list:
    """Find files in path and return them as a list.
    Gets all files in folders and subfolders

    See the answer on the link below for a ridiculously
    complete answer for this.
    https://stackoverflow.com/a/41447012/9267296
    Args:
        path (str, optional): Which path to start on.
                              Defaults to '.'.
        ext (str/list, optional): Optional file extention.
                                  Defaults to None.

    Returns:
        list: list of full file paths
    """
    result = []
    for subdir, dirs, files in os.walk(path):
        for fname in files:
            filepath = f"{subdir}{os.sep}{fname}"
            if ext == None:
                result.append(filepath)
            elif type(ext) == str and fname.lower().endswith(ext.lower()):
                result.append(filepath)
            elif type(ext) == list:
                for item in ext:
                    if fname.lower().endswith(item.lower()):
                        result.append(filepath)
    return result
  • Related