I am struggling with grouping files in a directory and returning the files with only max id.
There are following files in the directory:
FileA_212456.txt
FileA_234567.txt
FileB_88912.txt
FileB_891234.txt
FileC_829103.txt
FileC_821234.txt
...
The expected results is:
FileA_234567.txt
FileB_891234.txt
FileC_821234.txt ...
I tried the the code below, splitting the file by "_" and using [1] as an id to sort out and return by max(id), but not sure how to group them in a dictionary. Is there a better way to accomplish this?
import os
directory = '/directory'
dictionary = {}
for file in os.listdir(directory):
id = file.split('_')[1].split('.')[0]
file_name = file.split('_')[0]
dictionary[id ] = file_name
print([max(k) for k in dictionary.items()])
CodePudding user response:
The dictionary should be organized the other way round:
- key should be filename (without id)
- ids should be created (if filename key doesn't exist) or updated when a greater value is found
like this (with hardcoded list so it's self-contained)
files ="""FileA_212456.txt
FileA_234567.txt
FileB_88912.txt
FileB_891234.txt
FileC_829103.txt
FileC_821234.txt""".splitlines()
dictionary = {}
for file in files:
ident = int(file.split('_')[1].split('.')[0])
file_name = file.split('_')[0]
if file_name not in dictionary:
dictionary[file_name] = ident # first time
else:
dictionary[file_name] = max(dictionary[file_name],ident)
for k,v in dictionary.items():
print("{}_{}.txt".format(k,v))
the result is:
FileA_234567.txt
FileB_891234.txt
FileC_829103.txt
CodePudding user response:
I would say go with an if and elif statement to check if the current loop has a bigger number. Also a few changes,
- "id" is a builtin for python so I would name it something else
- Make sure to covert the "id" to an int to be able to compare correctly or else your just comparing strings
- This one is a extra but I imported collections to be able to easily sort the dictionary by "file_name"
Here is the code:
import os
import collections
directory = './directory'
dictionary = {}
for file in os.listdir(directory):
fileID = int(file.split('_')[1].split('.')[0])
file_name = file.split('_')[0]
if file_name not in dictionary:
dictionary[file_name] = fileID
elif dictionary[file_name] < fileID:
dictionary[file_name] = fileID
dictionary = collections.OrderedDict(sorted(dictionary.items()))
print(dictionary)
for x in dictionary.keys():
print(f"{x}_{dictionary[x]}.txt")