I have a list of files that I want to detect if they are present in a subdirectory or not, I've gotten quite close but I'm stuck at the last step (number 5).
Steps Taken
- Get File Names From Provided Text File
- Save file names as a list
- Loop through the previously saved file name list
- Loop through directories and sub-directories to identify if files are present or not
- Save file names in the second list that are found
The provided text file has a list for example:
- testfile1.txt
- testfile2.txt
- testfile3.txt
- testfile4.txt
- testfile5.txt
where only testfile1-4 are actually present within the (sub)directories.
Expected output is a list as ['testfile1.txt', 'testfile2.txt', 'testfile3.txt', 'testfile4.txt'] for example.
Code
import os.path
from os import path
import sys
file = sys.argv[1]
#top_dir = sys.argv[2]
cwd = os.getcwd()
with open(file, "r") as f: #Step 1
file_list = []
for line in f:
file_name = line.strip()
file_list.append(file_name) #Step 2
print(file_list)
for file in file_list: #Step 3
detected_files = []
for dir, sub_dirs, files in os.walk(cwd): #Step 4
if file in files:
print(file)
print("Files Found")
detected_files.append(file) #Step 5
print(detected_files)
What it prints out:
Files Found
testfile1.txt
['testfile1.txt']
Files Found
testfile2.txt
['testfile2.txt']
Files Found
testfile3.txt
['testfile3.txt']
Files Found
testfile4.txt
['testfile4.txt']
CodePudding user response:
Your current process looks like this
with open(file, "r") as f: #Step 1
...
for file in file_list: #Step 3
detected_files = []
...
for dir, sub_dirs, files in os.walk(cwd): #Step 4
...
You can see that on every iteration of for file in file_list:
you make a new empty detected_files
list - losing any information that was previously saved.
detected_files
should be made once
detected_files = []
with open(file, "r") as f: #Step 1
...
for file in file_list: #Step 3
...
for dir, sub_dirs, files in os.walk(cwd): #Step 4
...
I would use a set for membership testing and keep all found filenames in a set (to avoid duplicates).
detected_files = set()
with open(file, "r") as f: #Step 1
file_list = set(line.strip() for line in f)
for dir, sub_dirs, files in os.walk(cwd): #Step 4
found = file_list.intersection(files)
detected_files.update(found)
If you wanted you could short-circuit the process if all files are found.
for dir, sub_dirs, files in os.walk(cwd): #Step 4
found = file_list.intersection(files)
detected_files.update(found)
if detected_files == file_list: break