Home > Net >  Simple Python program that checks in each subfolder how many files there are and which extensions th
Simple Python program that checks in each subfolder how many files there are and which extensions th

Time:12-31

I am writing a simple python script that looks in the subfolders of the selected subfolder for files and summarizes which extensions are used and how many.

I am not really familiar with os.walk and I am really stuck with the "for file in files" section `

for file in files:
    total_file_count  = 1
    
    # Get the file extension
    extension = file.split(".")[-1]
    
    # If the extension is not in the dictionary, add it
    if extension not in file_counts[subfolder]:
        file_counts[subfolder][extension] = 1
    # If the extension is already in the dictionary, increase the count by 1
    else:
        file_counts[subfolder][extension]  = 1

`

I thought a for loop was the best option for the loop that summarizes the files and extensions but it only takes the last subfolder and gives a output of the files that are in the last map.

Does anybody maybe have a fix or a different aproach for it?

FULL CODE:

`

import os
# Set file path using / {End with /}
root_path="C:/Users/me/Documents/"
# Initialize variables to keep track of file counts
total_file_count=0
file_counts = {}
# Iterate through all subfolders and files using os.walk
for root, dirs, files in os.walk(root_path):
        # Get currenty subfolder name
        subfolder = root.split("/")[-1]
        print(subfolder)

# Initialize a count for each file type
file_counts[subfolder] = {}


# Iterate through all files in the subfolder
for file in files:
    total_file_count  = 1
    
    # Get the file extension
    extension = file.split(".")[-1]
    
    # If the extension is not in the dictionary, add it
    if extension not in file_counts[subfolder]:
        file_counts[subfolder][extension] = 1
    # If the extension is already in the dictionary, increase the count by 1
    else:
        file_counts[subfolder][extension]  = 1

# Print total file count
print(f"There are a total of {total_file_count} files.")

# Print the file counts for each subfolder
for subfolder, counts in file_counts.items():
    print(f"In the {subfolder} subfolder:")
for extension, count in counts.items():
    print(f"There are {count} .{extension} files")

` Thank you in advance :)

CodePudding user response:

If I understand correctly, you want to count the extensions in ALL subfolders of the given folder, but are only getting one folder. If that is indeed the problem, then the issue is this loop

for root, dirs, files in os.walk(root_path):
        # Get currenty subfolder name
        subfolder = root.split("/")[-1]
        print(subfolder)

You are iterating through os.walk, but you keep overwriting the subfolder variable. So while it will print out every subfolder, it will only remember the LAST subfolder it encounters - leading to the code returning only on subfolder.

Solution 1: Fix the loop

If you want to stick with os.walk, you just need to fix the loop. First things first - define files as a real variable. Don't rely on using the temporary variable from the loop. You actually already have this: file_counts!

Then, you need someway to save the files. I see that you want to split this up by subfolder, so what we can do is use file_counts, and use it to map each subfolder to a list of files (you are trying to do this, but are fundamentally misunderstanding some python code; see my note below about this).

So now, we have a dictionary mapping each subfolder to a list of files! We would just need to iterate through this and count the extensions. The final code looks something like this:


file_counts = {}
extension_counts = {}

# Iterate through all subfolders and files using os.walk

for root, dirs, files in os.walk(root_path):
        subfolder = root.split("/")[-1]
        file_counts[subfolder] = files
        extensions_counts[subfolder]={}


# Iterate through all subfolders, and then through all files
for subfolder in file_counts:
   for file in file_counts[subfolder]:
        
        total_file_count  = 1
    
        extension = file.split(".")[-1]
    

        if extension not in extension_counts[subfolder]:
            extension_counts[subfolder][extension] = 1
        else:
            extension_counts[subfolder][extension]  = 1

Solution 2: Use glob

Instead of os.walk, you can use the glob module, which will return a list of all files and directories wherever you search. It is a powerful tool that uses wildcard matching, and you can read about it here

Note

In your code, you write

# Initialize a count for each file type
file_counts[subfolder] = {}

Which feels like a MATLAB coding scheme. First, subfolder is a variable, and not a vector, so this would only initialize a count for a single file type (and even if it was a list, you get an unhashable type error). Second, this seems to stem from the idea that continuously assigning a variable in a loop builds a list instead of overwriting, which is not true. If you want to do that, you need to initialize an empty list, and use .append().

Note 2: Electric Boogaloo

There are two big ways to make this code good, and here are hints

  1. Look into default dictionaries. They will make your code less redundant
  2. Do you REALLY need to save the numbers and THEN count? What if you counted directly?

CodePudding user response:

Rather than using os.walk you could use the rglob and glob methods of Path object. E.g.,

from pathlib import Path

root_path="C:/Users/me/Documents/"

# get a list of all the directories within root (and recursively within those subdirectories
dirs = [d for d in Path().rglob(root_path   "*") if d.is_dir()]
dirs.append(Path(root_path))  # append root directory

# loop through all directories
for curdir in dirs:
    # get suffixes (i.e., extensions) of all files in the directory
    suffixes = set([s.suffix for s in curdir.glob("*") if s.is_file()])

    print(f"In the {curdir}:")

    # loop through the suffixes
    for suffix in suffixes:
        # get all the files in the currect directory with that extension
        suffiles = curdir.glob(f"*{suffix}")
        print(f"There are {len(list(suffiles))} {suffix} files")
  • Related