Home > Software design >  Detecting Multiple Files from List within Subdirectories Python
Detecting Multiple Files from List within Subdirectories Python

Time:04-22

I have a list of files that I want to detect if they are present in a subdirectory or not, I've gotten quite close but I'm stuck at the last step (number 5).

Steps Taken

  1. Get File Names From Provided Text File
  2. Save file names as a list
  3. Loop through the previously saved file name list
  4. Loop through directories and sub-directories to identify if files are present or not
  5. Save file names in the second list that are found

The provided text file has a list for example:

  • testfile1.txt
  • testfile2.txt
  • testfile3.txt
  • testfile4.txt
  • testfile5.txt

where only testfile1-4 are actually present within the (sub)directories.

Expected output is a list as ['testfile1.txt', 'testfile2.txt', 'testfile3.txt', 'testfile4.txt'] for example.

Code

import os.path
from os import path
import sys

file = sys.argv[1]
#top_dir = sys.argv[2]
cwd = os.getcwd()

with open(file, "r") as f: #Step 1
    file_list = []
    for line in f:
        file_name = line.strip()
        file_list.append(file_name) #Step 2
    print(file_list)
    for file in file_list: #Step 3
        detected_files = []
        for dir, sub_dirs, files in os.walk(cwd): #Step 4
            if file in files:
                print(file)
                print("Files Found")
                detected_files.append(file) #Step 5
                print(detected_files)

What it prints out:

Files Found
testfile1.txt
['testfile1.txt']
Files Found
testfile2.txt
['testfile2.txt']
Files Found
testfile3.txt
['testfile3.txt']
Files Found
testfile4.txt
['testfile4.txt']

CodePudding user response:

Your current process looks like this

with open(file, "r") as f: #Step 1
    ...
    for file in file_list: #Step 3
        detected_files = []
        ...
        for dir, sub_dirs, files in os.walk(cwd): #Step 4
            ...

You can see that on every iteration of for file in file_list: you make a new empty detected_files list - losing any information that was previously saved.

detected_files should be made once

detected_files = []
with open(file, "r") as f: #Step 1
    ...
    for file in file_list: #Step 3
        ...
        for dir, sub_dirs, files in os.walk(cwd): #Step 4
            ...

I would use a set for membership testing and keep all found filenames in a set (to avoid duplicates).

detected_files = set()
with open(file, "r") as f: #Step 1
    file_list = set(line.strip() for line in f)
for dir, sub_dirs, files in os.walk(cwd): #Step 4
    found = file_list.intersection(files)
    detected_files.update(found)

If you wanted you could short-circuit the process if all files are found.

for dir, sub_dirs, files in os.walk(cwd): #Step 4
    found = file_list.intersection(files)
    detected_files.update(found)
    if detected_files == file_list: break
  • Related