I have a folder that contains several subfolders, each containing 3-4 files that I need. I am trying to iterate through that folder and put all the files from each subfolder in a dictionary that is later dumped in a json file.
So far I've managed to do this for a single file and the json file looks like this:
and this is the code:
import os
import json
myDir = "\\\iads011n\\ContinuousTesting\\DailyTesting\\REPORTS"
filelist = []
for path, subdirs, files in os.walk(myDir):
for file in files:
if (file.endswith('.xlsx') or file.endswith('.xls') or file.endswith('.XLS')) and "Release" in file and "Integrated" not in file:
filelist.append(os.path.join(file))
myDict = dict(zip(range(len(filelist)), filelist))
result=[]
for k,v in myDict.items():
result.append({'id' : k, 'name' : v})
with open('XLList.json', 'w') as json_file:
json.dump(result, json_file)
But what I'm trying to achieve is this:
And one of the subfolder contents looks like this:
So basically what I need is all the xls/ xlsx files under the same subfolder grouped. The major problem is that not all the subfolders contain the same items, some may have only one xlsx file or another may have only 3 or 4, etc.
CodePudding user response:
The problem is, that you do not "store" to which folder each file belongs. A solution would be the following:
result = []
for i, (path, subdirs, files) in enumerate(os.walk(myDir)): #use enumerate to track folder id
subdir = {"id": i}
j = 0 #file counter in subfolder
for file in files:
if (file.endswith('.xlsx') or file.endswith('.xls') or file.endswith('.XLS')) and "Release" in file and "Integrated" not in file:
subdir[f"name{j}"] = file
j = 1
result.append(subdir)
CodePudding user response:
Assuming id = 0
:
result = {'id': 0}
for i, filename in enumerate(filelist):
result[f'name{i}'] = filename
The json output of result
will be:
{
"id": 0,
"name0": "some-filename.xlsx",
"name1": "some-filename.xlsx",
"name2": "some-filename.xlsx",
"name3": "some-filename.xlsx",
...
}
enumerate
is python's built-in function. You can also start it from 1, if you don't want to put name0
.
result = {'id': 0}
for i, filename in enumerate(filelist, 1):
...
A suggestion for your code:
for path, subdirs, files in os.walk(myDir):
for file in files:
if (file.endswith('.xlsx') or file.endswith('.xls') or file.endswith('.XLS')) and "Release" in file and "Integrated" not in file:
filelist.append(os.path.join(file))
I suggest using regex r"xlsx?$"
with ignorecase to match filenames, so one condition is handling all your scenarios:
test_filenames = ["sample-name.XLSX", "sample-name.xlsx","sample-name.xls", "sample-name.XLS"]
for filename in test_filenames:
if re.search(r"xlsx?$", filename, re.I):
# it's matching