Home > other >  Why do I get a ''FileNotFoundError'' in python?
Why do I get a ''FileNotFoundError'' in python?

Time:01-23

I have a list of the xlsx files in my directory and subdirectories and I want to loop through this list with certain conditions. Now it seems that the code works for the main directory, but it has troubles opening the files within the subdirectories.. I used the os.walk method but I still get the error ''[Errno 2] No such file or directory: 'file name''' . The error occurs at the last piece of the code, the part that starts with 'for f in files: if f.endswith('.xlsx'): and so on..

How to fix this problem?

path = os.getcwd()
files = os.listdir(path)

directories = ['2018', '2017', '2016', '2015']

for directory in directories:
   directory_path = os.path.join(path, directory)
   files_in_directory = os.listdir(directory_path)
   for file in files_in_directory:
       files.append(file)


 filtered_files_list = []

 for f in files:
    if f.endswith('.xlsx'):      
       wb = openpyxl.load_workbook(f)
       if "2014" in wb.sheetnames:
           filtered_files_list.append(f)

 for root, dirs, files in os.walk(path):
   if root.endswith("2018") or root.endswith("2017") or root.endswith("2016") or root.endswith("2015"):
        for f in files:
           if f.endswith('.xlsx'):               
               wb = openpyxl.load_workbook(os.path.join(root, f))
               if "2014" in wb.sheetnames:
                   filtered_files_list.append(f)

print(filtered_files_list)

CodePudding user response:

Your listdir walk combination sounds like it could be simplified with a pathlib.Path.glob, which will also give you full paths without the need to join.

from pathlib import Path
from openpyxl import load_workbook

filtered_files_list = []

filter_directories = {"2015", "2016", "2017", "2018"}  # set for fast search

p = Path(".")  # equivalent to getcwd
for xlsx in p.glob("./**/*.xlsx"):  # recursive search for all XLSX under CWD
    if str(xlsx.parents[-2]) not in filter_directories:  # skip if not in filter_directories
        continue
    wb = openpyxl.load_workbook(xlsx)
    if "2014" in wb.sheetnames:
        filtered_files_list.append(xlsx)

In the following hierarchy it finds:

.
├── 2015
│   ├── has-2014-sheet.xlsx
│   └── no-2014-sheet.xlsx
├── 2016
│   └── sub
│       ├── has-2014-sheet.xlsx
│       └── no-2014-sheet.xlsx
├── 2020
│   ├── has-2014-sheet.xlsx
│   └── no-2014-sheet.xlsx
└── other
    ├── has-2014-sheet.xlsx
    └── no-2014-sheet.xlsx
[PosixPath('2015/has-2014-sheet.xlsx'),
 PosixPath('2016/sub/has-2014-sheet.xlsx')]
  • Related