I would like to all the files of a directory to a set, including in subdirectories. However, I'd like to exclude folders like e.g. 'node_modules'.
Searching on Stackoverflow I found the following code used to walk directories recursively while excluding some:
foldersToExclude = ['node_modules']
for root, dirs, files in os.walk(path, topdown=True):
dirs[:] = [d for d in dirs if d not in foldersToExclude]
print(d)
However, if i print (d) it also does show node modules. I don't fully understand what is happening here. At what point can I add the files to a set so that files nested in folders named 'node_modules' are excluded?
CodePudding user response:
As DeepSpace mentioned, it should be dirs and not d. Also, if you're having trouble understanding what's happening, I'd suggest writing a "for" loop rather than using list comprehension (the bracket notation) as it is easier to read for a beginner. For example:
final_dirs_list=[] # Create an empty list
for root, dirs, files in os.walk(path, topdown=True):
for d in dirs:
if d != 'node_modules': # Check if directory name is node_modules
final_dirs_list.append(d) # If it isn't, append directory name to list
print(final_dirs_list)
If you want to use list comprehension, your code can be cleaned up to:
foldersToExclude = ['node_modules']
for root, dirs, files in os.walk(path, topdown=True):
dirs_list = [d for d in dirs if d not in foldersToExclude]
print(dirs_list)
CodePudding user response:
I think your problem is not with how you're using dirs
, but in the printing of d
. Rather than print the d
used in the list comprehension, you should test whether you iterate into the right folders. To test iterators, I like to use next
to see exactly what's happening after each step rather than in a for
loop.
Here I first test that I iterate into the t1
folder and then do it once more trying to exclude that folder just as you did.
➜ /tmp find test
test
test/t2
test/t2/ff2
test/t1
test/t1/t11
test/t1/t11/f11
test/t1/t12
test/t1/t12/f12
➜ /tmp python
Python 3.9.10 (main, Jan 15 2022, 12:21:28)
[Clang 13.0.0 (clang-1300.0.29.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> w = os.walk("test",topdown=True)
>>> root, dirs, files = next(w)
>>> dirs
['t2', 't1']
>>> root, dirs, files = next(w)
>>> root
'test/t2'
>>> root, dirs, files = next(w)
>>> root
'test/t1'
>>> root, dirs, files = next(w)
>>> root
'test/t1/t11'
>>> root, dirs, files = next(w)
>>> root
'test/t1/t12'
>>> root, dirs, files = next(w)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> w = os.walk("test",topdown=True)
>>> root, dirs, files = next(w)
>>> dirs[:] = [d for d in dirs if "1" not in d]
>>> dirs
['t2']
>>> root, dirs, files = next(w)
>>> root
'test/t2'
>>> root, dirs, files = next(w)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>>
Printing d
doesn't even work with the version of Python I'm using because the variable is local to the list comprehension and not visible outside of it.
>>> import sys
>>> sys.version
'3.9.10 (main, Jan 15 2022, 12:21:28) \n[Clang 13.0.0 (clang-1300.0.29.3)]'
>>> w = os.walk("test",topdown=True)
>>> root, dirs, files = next(w)
>>> dirs[:] = [d for d in dirs if "1" not in d]
>>> print(d)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'd' is not defined
>>>