I have a main folder: e:\\PLUS
, which contains another 4 subfolders (A,B,C,D
). Now, my python code saves all the html files from the main folder (PLUS
) but doesn't save the files from the other 4 subfolders.
Can anyone update my code a little bit, so as to save the files also from the subfolders?
def check_links_for_all_files(directory_name):
for file in os.listdir(directory_name):
filename = str(file)
print(filename)
if filename.endswith(".html"): #or filename.endswith(".php"):
file_path = os.path.join(directory_name, filename)
check_link(file_path)
else:
continue
if __name__ == '__main__':
check_links_for_all_files("e:\\Plus")
CodePudding user response:
You are iterating over the main directory, but not going into sub-directories.
Try using os.path.isdir
to handle the sub-directories.
Could do something like this:
def check_links_for_all_files(directory_name):
for file in os.listdir(directory_name):
path = os.path.join(directory_name, str(file))
if os.path.isdir(path):
check_links_for_all_files(path)
if path.endswith(".html"): #or filename.endswith(".php"):
check_link(path)
else:
continue
Notice this will handle the entire directory tree and not just the first hop into the sub-directories.
CodePudding user response:
os.walk is very efficient to iterate over all files and subfolders, here is an example:
import os
def check_links_for_all_files(directory_name):
for root, dirs, files in os.walk(directory_name):
for file in files:
if file.endswith(".html"): # or file.endswith(".php"):
file_path = os.path.join(root, file)
check_link(file_path)
else:
continue
if __name__ == '__main__':
check_links_for_all_files("/Users/hbohra/Downloads/")
CodePudding user response:
import glob
import os
def check_links_for_all_files(directory_name):
for file_path in glob.glob(
os.path.join(directory_name, '**', '*.html'),recursive=True):
check_link(file_path)
CodePudding user response:
You can also use the pathlib module. It belongs to the python standard library as well, and, in my opinion, it may be a bit more intuitive to use than the os module.
import pathlib
def check_links_for_all_files(directory_name):
directories = [pathlib.Path(directory_name)]
for directory in directories:
for file in directory.iterdir():
if file.is_dir():
directories.append(file)
continue
print(file.name)
if file.suffix == '.html':
check_link(file)