I have used "os.walk()" to list all subfolders and files in a directory tree , but heard that "os.scandir()" does the job up to 2X - 20X faster. So I tried this code:
def tree2list (directory:str) -> list:
import os
tree = []
counter = 0
for i in os.scandir(directory):
if i.is_dir():
counter =1
tree.append ([counter,'Folder', i.name, i.path]) ## doesn't list the whole tree
tree2list(i.path)
#print(i.path) ## this line prints all subfolders in the tree
else:
counter =1
tree.append([counter,'File', i.name, i.path])
#print(i.path) ## this line prints all files in the tree
return tree
and when test it:
## tester
folder = 'E:/Test'
print(tree2list(folder))
I got only the content of the root directory and none from sub-directories below tree hierarchy, while all print statements in above code work fine.
[[1, 'Folder', 'Archive', 'E:/Test\\Archive'], [2, 'Folder', 'Source', 'E:/Test\\Source']]
What have I done wrong ?, and how can I fix it?!
CodePudding user response:
Your code almost works, just a minor modification is required:
def tree2list(directory: str) -> list:
import os
tree = []
counter = 0
for i in os.scandir(directory):
if i.is_dir():
counter = 1
tree.append([counter, 'Folder', i.name, i.path])
tree.extend(tree2list(i.path))
# print(i.path) ## this line prints all subfolders in the tree
else:
counter = 1
tree.append([counter, 'File', i.name, i.path])
# print(i.path) ## this line prints all files in the tree
return tree
Although I don't understand what the purpose of the counter
variable is, so I'd probably remove it.
Further, I have to agree with @Gelineau that your approach utilizes array-copies quite heavily and is therefore most likely quite slow. An iterator based approach as in his response is more suited for a large number of files.
CodePudding user response:
Using generators (yield
, yield from
) allows to manage the recursion with concise code:
from pprint import pprint
from typing import Iterator, Tuple
def tree2list(directory: str) -> Iterator[Tuple[str, str, str]]:
import os
for i in os.scandir(directory):
if i.is_dir():
yield ["Folder", i.name, i.path]
yield from tree2list(i.path)
else:
yield ["File", i.name, i.path]
folder = "/home/yfgy6415/dev/tmp"
pprint(list(tree2list(folder)))
Or: pprint(list(enumerate(tree2list(folder), start=1)))
if you want the counter.