Home > Software engineering >  Python: How to save files from folders and subfolders?
Python: How to save files from folders and subfolders?

Time:11-30

I have a main folder: e:\\PLUS, which contains another 4 subfolders (A,B,C,D). Now, my python code saves all the html files from the main folder (PLUS) but doesn't save the files from the other 4 subfolders.

Can anyone update my code a little bit, so as to save the files also from the subfolders?

def check_links_for_all_files(directory_name):
    for file in os.listdir(directory_name):
        filename = str(file)
        print(filename)
        
        if filename.endswith(".html"): #or filename.endswith(".php"):
            file_path = os.path.join(directory_name, filename)
            
            check_link(file_path)
        else:
            continue

if __name__ == '__main__':
    check_links_for_all_files("e:\\Plus")

CodePudding user response:

You are iterating over the main directory, but not going into sub-directories. Try using os.path.isdir to handle the sub-directories.

Could do something like this:

def check_links_for_all_files(directory_name):
    for file in os.listdir(directory_name):
        path = os.path.join(directory_name, str(file))
        if os.path.isdir(path):
            check_links_for_all_files(path)
        
        if path.endswith(".html"): #or filename.endswith(".php"):
            check_link(path)
        else:
            continue

Notice this will handle the entire directory tree and not just the first hop into the sub-directories.

CodePudding user response:

os.walk is very efficient to iterate over all files and subfolders, here is an example:

import os
    
def check_links_for_all_files(directory_name):
    for root, dirs, files in os.walk(directory_name):
        for file in files:
            if file.endswith(".html"):  # or file.endswith(".php"):
                file_path = os.path.join(root, file)
                check_link(file_path)
            else:
                continue

if __name__ == '__main__':
    check_links_for_all_files("/Users/hbohra/Downloads/")

CodePudding user response:

import glob
import os

def check_links_for_all_files(directory_name):
  for file_path in glob.glob(
      os.path.join(directory_name, '**', '*.html'),recursive=True):
    check_link(file_path)

CodePudding user response:

You can also use the pathlib module. It belongs to the python standard library as well, and, in my opinion, it may be a bit more intuitive to use than the os module.

import pathlib

def check_links_for_all_files(directory_name):
    directories = [pathlib.Path(directory_name)]
    for directory in directories:
        for file in directory.iterdir():
            if file.is_dir():
                directories.append(file)
                continue
            print(file.name)
            if file.suffix == '.html':
                check_link(file)
  • Related