I have a directory of files that follows this file naming pattern:
alice_01.mov
alice_01.mp4
alice_02.mp4
bob_01.avi
My goal is to find all files at a given path and create a "multidimensional" list of them where each sublist is the unique name of the file (without extension) and then a list of extensions, like so:
resulting_list = [
['alice_01', ['mov','mp4']],
['alice_02', ['mp4']],
['bob_01', ['avi']]
]
I have gotten this far:
import os
path = "user_files/"
def user_files(path):
files = []
for file in os.listdir(path):
files.append(file)
return files
file_array = []
for file in user_files(path):
file_name = file.split(".")[0]
file_ext = file.split(".")[1]
if file_name not in (sublist[0] for sublist in file_array):
file_array.append([file_name,[file_ext]])
else:
file_array[file_array.index(file_name)].append([file_name,[file_ext]])
print(file_array)
My problem is in the else
condition but I'm struggling to get it right.
Any help is appreciated.
CodePudding user response:
Here's how you can do it using a dict
to store the results:
filenames = [
"alice_01.mov",
"alice_01.mp4",
"alice_02.mp4",
"bob_01.avi",
]
file_dict = {}
for file in filenames:
file_name, file_ext = file.split(".")[0:2]
file_dict.setdefault(file_name, []).append(file_ext)
print(file_dict)
Result:
{'alice_01': ['mov', 'mp4'], 'alice_02': ['mp4'], 'bob_01': ['avi']}
UPDATE: The code above doesn't handle special cases, so here's a slightly more robust version.
from pprint import pprint
filenames = [
"alice_01.mov",
"alice_01.mp4",
"alice_02.mp4",
"bob_01.avi",
"john_007.json.xz",
"john_007.json.txt.xz",
"john_007.json.txt.zip",
"tom_and_jerry",
"tom_and_jerry.dat",
]
file_dict = {}
for file in filenames:
parts = file.split(".")
if len(parts) > 1:
file_name = ".".join(parts[0:-1])
file_ext = parts[-1]
else:
file_name = parts[0]
file_ext = ""
file_dict.setdefault(file_name, []).append(file_ext)
pprint(file_dict)
Result:
{'alice_01': ['mov', 'mp4'],
'alice_02': ['mp4'],
'bob_01': ['avi'],
'john_007.json': ['xz'],
'john_007.json.txt': ['xz', 'zip'],
'tom_and_jerry': ['', 'dat']}