I have a list of the xlsx files in my directory and subdirectories and I want to loop through this list with certain conditions. Now it seems that the code works for the main directory, but it has troubles opening the files within the subdirectories.. I used the os.walk method but I still get the error ''[Errno 2] No such file or directory: 'file name''' . The error occurs at the last piece of the code, the part that starts with 'for f in files: if f.endswith('.xlsx'): and so on..
How to fix this problem?
path = os.getcwd()
files = os.listdir(path)
directories = ['2018', '2017', '2016', '2015']
for directory in directories:
directory_path = os.path.join(path, directory)
files_in_directory = os.listdir(directory_path)
for file in files_in_directory:
files.append(file)
filtered_files_list = []
for f in files:
if f.endswith('.xlsx'):
wb = openpyxl.load_workbook(f)
if "2014" in wb.sheetnames:
filtered_files_list.append(f)
for root, dirs, files in os.walk(path):
if root.endswith("2018") or root.endswith("2017") or root.endswith("2016") or root.endswith("2015"):
for f in files:
if f.endswith('.xlsx'):
wb = openpyxl.load_workbook(os.path.join(root, f))
if "2014" in wb.sheetnames:
filtered_files_list.append(f)
print(filtered_files_list)
CodePudding user response:
Your listdir
walk
combination sounds like it could be simplified with a pathlib.Path.glob
, which will also give you full paths without the need to join
.
from pathlib import Path
from openpyxl import load_workbook
filtered_files_list = []
filter_directories = {"2015", "2016", "2017", "2018"} # set for fast search
p = Path(".") # equivalent to getcwd
for xlsx in p.glob("./**/*.xlsx"): # recursive search for all XLSX under CWD
if str(xlsx.parents[-2]) not in filter_directories: # skip if not in filter_directories
continue
wb = openpyxl.load_workbook(xlsx)
if "2014" in wb.sheetnames:
filtered_files_list.append(xlsx)
In the following hierarchy it finds:
.
├── 2015
│ ├── has-2014-sheet.xlsx
│ └── no-2014-sheet.xlsx
├── 2016
│ └── sub
│ ├── has-2014-sheet.xlsx
│ └── no-2014-sheet.xlsx
├── 2020
│ ├── has-2014-sheet.xlsx
│ └── no-2014-sheet.xlsx
└── other
├── has-2014-sheet.xlsx
└── no-2014-sheet.xlsx
[PosixPath('2015/has-2014-sheet.xlsx'),
PosixPath('2016/sub/has-2014-sheet.xlsx')]