Home > Back-end >  count files recursively but exclude directories (python)
count files recursively but exclude directories (python)

Time:03-13

I would like to all the files of a directory to a set, including in subdirectories. However, I'd like to exclude folders like e.g. 'node_modules'.

Searching on Stackoverflow I found the following code used to walk directories recursively while excluding some:

foldersToExclude = ['node_modules']

for root, dirs, files in os.walk(path, topdown=True):
  dirs[:] = [d for d in dirs if d not in foldersToExclude]
  print(d)

However, if i print (d) it also does show node modules. I don't fully understand what is happening here. At what point can I add the files to a set so that files nested in folders named 'node_modules' are excluded?

CodePudding user response:

As DeepSpace mentioned, it should be dirs and not d. Also, if you're having trouble understanding what's happening, I'd suggest writing a "for" loop rather than using list comprehension (the bracket notation) as it is easier to read for a beginner. For example:

final_dirs_list=[] # Create an empty list
for root, dirs, files in os.walk(path, topdown=True):
    for d in dirs:
        if d != 'node_modules': # Check if directory name is node_modules
            final_dirs_list.append(d) # If it isn't, append directory name to list

print(final_dirs_list)

If you want to use list comprehension, your code can be cleaned up to:

foldersToExclude = ['node_modules']
for root, dirs, files in os.walk(path, topdown=True):
  dirs_list = [d for d in dirs if d not in foldersToExclude]
  
print(dirs_list)

CodePudding user response:

I think your problem is not with how you're using dirs, but in the printing of d. Rather than print the d used in the list comprehension, you should test whether you iterate into the right folders. To test iterators, I like to use next to see exactly what's happening after each step rather than in a for loop.

Here I first test that I iterate into the t1 folder and then do it once more trying to exclude that folder just as you did.

➜  /tmp find test
test
test/t2
test/t2/ff2
test/t1
test/t1/t11
test/t1/t11/f11
test/t1/t12
test/t1/t12/f12
➜  /tmp python   
Python 3.9.10 (main, Jan 15 2022, 12:21:28) 
[Clang 13.0.0 (clang-1300.0.29.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> w = os.walk("test",topdown=True)
>>> root, dirs, files = next(w)
>>> dirs
['t2', 't1']
>>> root, dirs, files = next(w)
>>> root
'test/t2'
>>> root, dirs, files = next(w)
>>> root
'test/t1'
>>> root, dirs, files = next(w)
>>> root
'test/t1/t11'
>>> root, dirs, files = next(w)
>>> root
'test/t1/t12'
>>> root, dirs, files = next(w)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> w = os.walk("test",topdown=True)
>>> root, dirs, files = next(w)
>>> dirs[:] = [d for d in dirs if "1" not in d]
>>> dirs
['t2']
>>> root, dirs, files = next(w)
>>> root
'test/t2'
>>> root, dirs, files = next(w)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> 

Printing d doesn't even work with the version of Python I'm using because the variable is local to the list comprehension and not visible outside of it.

>>> import sys
>>> sys.version
'3.9.10 (main, Jan 15 2022, 12:21:28) \n[Clang 13.0.0 (clang-1300.0.29.3)]'
>>> w = os.walk("test",topdown=True)
>>> root, dirs, files = next(w)
>>> dirs[:] = [d for d in dirs if "1" not in d]
>>> print(d)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'd' is not defined
>>> 
  • Related