Home > Mobile >  How to iterate through folder and get certain files from the subfolders grouped?
How to iterate through folder and get certain files from the subfolders grouped?

Time:09-24

I have a folder that contains several subfolders, each containing 3-4 files that I need. I am trying to iterate through that folder and put all the files from each subfolder in a dictionary that is later dumped in a json file.

So far I've managed to do this for a single file and the json file looks like this:

enter image description here

and this is the code:

import os
import json
myDir = "\\\iads011n\\ContinuousTesting\\DailyTesting\\REPORTS"
filelist = []
for path, subdirs, files in os.walk(myDir):
    for file in files:
        if (file.endswith('.xlsx') or file.endswith('.xls') or file.endswith('.XLS')) and "Release" in file and "Integrated" not in file:
            filelist.append(os.path.join(file))

myDict = dict(zip(range(len(filelist)), filelist))

result=[]
for k,v in myDict.items():
    result.append({'id' : k, 'name' : v})

with open('XLList.json', 'w') as json_file:
    json.dump(result, json_file)

But what I'm trying to achieve is this:

enter image description here

And one of the subfolder contents looks like this: enter image description here

So basically what I need is all the xls/ xlsx files under the same subfolder grouped. The major problem is that not all the subfolders contain the same items, some may have only one xlsx file or another may have only 3 or 4, etc.

CodePudding user response:

The problem is, that you do not "store" to which folder each file belongs. A solution would be the following:

result = []
for i, (path, subdirs, files) in enumerate(os.walk(myDir)): #use enumerate to track folder id
    subdir = {"id": i}
    j = 0 #file counter in subfolder
    for file in files:
        if (file.endswith('.xlsx') or file.endswith('.xls') or file.endswith('.XLS')) and "Release" in file and "Integrated" not in file:
            subdir[f"name{j}"] = file
            j  = 1
    result.append(subdir)

CodePudding user response:

Assuming id = 0:

result = {'id': 0}
for i, filename in enumerate(filelist):
    result[f'name{i}'] = filename

The json output of result will be:

{
  "id": 0,
  "name0": "some-filename.xlsx",
  "name1": "some-filename.xlsx",
  "name2": "some-filename.xlsx",
  "name3": "some-filename.xlsx",
  ...
}

enumerate is python's built-in function. You can also start it from 1, if you don't want to put name0.

result = {'id': 0}
for i, filename in enumerate(filelist, 1):
    ...

A suggestion for your code:

for path, subdirs, files in os.walk(myDir):
    for file in files:
        if (file.endswith('.xlsx') or file.endswith('.xls') or file.endswith('.XLS')) and "Release" in file and "Integrated" not in file:
            filelist.append(os.path.join(file))

I suggest using regex r"xlsx?$" with ignorecase to match filenames, so one condition is handling all your scenarios:

test_filenames = ["sample-name.XLSX", "sample-name.xlsx","sample-name.xls", "sample-name.XLS"]
for filename in test_filenames:
    if re.search(r"xlsx?$", filename, re.I):
        # it's matching
  • Related