I am trying to count all the files in a folder and all its subfolders For exemple, if my folder looks like this:
file1.txt
subfolder1/
├── file2.txt
├── subfolder2/
│ ├── file3.txt
│ ├── file4.txt
│ └── subfolder3/
│ └── file5.txt
└── file6.txt
file7.txt
I would like get the number 7.
The first thing I tried is a recursive function who count all files and calls itself for each folder
def get_file_count(directory: str) -> int:
count = 0
for filename in os.listdir(directory):
file = (os.path.join(directory, filename))
if os.path.isfile(file):
count = 1
elif os.path.isdir(file):
count = get_file_count(file)
return count
This way works but takes a lot of time for big directories.
I also remembered this post, which shows a quick way to count the total size of a folder using win32com and I wondered if this librairy also offered a way to do what I was looking for. But after searching, I only found this
fso = com.Dispatch("Scripting.FileSystemObject")
folder = fso.GetFolder(".")
size = folder.Files.Count
But this only returns the number of files in only the targeted folder (and not in its subfolders)
So, do you know if there is an optimal function in python that returns the number of files in a folder and all its subfolders?
CodePudding user response:
IIUC, you can just do
sum(len(files) for _, _, files in os.walk('path/to/folder'))
or perhaps, to avoid the len
for probably slightly better performance:
sum(1 for _, _, files in os.walk('folder_test') for f in files)
CodePudding user response:
This code will reveal a count of all directory entries that are not directories (e.g., plain files, symlinks) from a specified root.
Includes timing and an actual pathname used in the test:
from glob import glob, escape
import os
import time
def get_file_count(directory: str) -> int:
count = 0
for filename in glob(os.path.join(escape(directory), '*')):
if os.path.isdir(filename):
count = get_file_count(filename)
else:
count = 1
return count
start = time.perf_counter()
count = get_file_count('/Volumes/G-DRIVE Thunderbolt 3')
end = time.perf_counter()
print(count)
print(f'{end-start:.2f}s')
Output:
166231
2.38s
CodePudding user response:
i used os.walk()
its my sample , i hope it'll helps you
def file_dir():
directories = []
res = {}
cwd = os.getcwd()
for root, dirs, files in os.walk(cwd):
for file in files:
if file.endswith(".tsv"):
directories.append(os.path.join(root, file))
res['dir'] = directories
return res
CodePudding user response:
you could also directly use the command:
find DIR_NAME -type f | wc -l
this returns the count of all files
With os.system()
this can be done from python.
CodePudding user response:
Another solution using the libraries os
and Path
:
from pathlib import Path
from os.path import isfile
len([x for x in Path('./dir1').rglob('*') if isfile(x)])
CodePudding user response:
The proper way is to use os.walk
as others have pointed out, but to give another solution which resembles your original as much as possible:
You can use os.scandir
to avoid the cost of constructing the entire list, it should be substantially faster:
def get_file_count(directory: str) -> int:
count = 0
for entry in os.scandir(directory):
if entry.is_file():
count = 1
elif entry.is_dir():
count = get_file_count(os.path.join(directory, entry.name))
return count