Home > Net >  Python: Group files by name and return with max ID
Python: Group files by name and return with max ID

Time:10-16

I am struggling with grouping files in a directory and returning the files with only max id.

There are following files in the directory:

FileA_212456.txt
FileA_234567.txt
FileB_88912.txt
FileB_891234.txt
FileC_829103.txt
FileC_821234.txt ...

The expected results is:

FileA_234567.txt
FileB_891234.txt
FileC_821234.txt ...

I tried the the code below, splitting the file by "_" and using [1] as an id to sort out and return by max(id), but not sure how to group them in a dictionary. Is there a better way to accomplish this?

import os

directory = '/directory'
dictionary = {}

for file in os.listdir(directory):
    id = file.split('_')[1].split('.')[0]
    file_name = file.split('_')[0]
    dictionary[id ] = file_name 

print([max(k) for k in dictionary.items()])

CodePudding user response:

The dictionary should be organized the other way round:

  • key should be filename (without id)
  • ids should be created (if filename key doesn't exist) or updated when a greater value is found

like this (with hardcoded list so it's self-contained)

files ="""FileA_212456.txt
FileA_234567.txt
FileB_88912.txt
FileB_891234.txt
FileC_829103.txt
FileC_821234.txt""".splitlines()

dictionary = {}

for file in files:
    ident = int(file.split('_')[1].split('.')[0])
    file_name = file.split('_')[0]
    if file_name not in dictionary:
        dictionary[file_name] = ident  # first time
    else:
        dictionary[file_name] = max(dictionary[file_name],ident)

for k,v in dictionary.items():
    print("{}_{}.txt".format(k,v))

the result is:

FileA_234567.txt
FileB_891234.txt
FileC_829103.txt

CodePudding user response:

I would say go with an if and elif statement to check if the current loop has a bigger number. Also a few changes,

  1. "id" is a builtin for python so I would name it something else
  2. Make sure to covert the "id" to an int to be able to compare correctly or else your just comparing strings
  3. This one is a extra but I imported collections to be able to easily sort the dictionary by "file_name"

Here is the code:

import os
import collections

directory = './directory'
dictionary = {}



for file in os.listdir(directory):
    fileID = int(file.split('_')[1].split('.')[0])
    file_name = file.split('_')[0]

    if file_name not in dictionary:
        dictionary[file_name] = fileID
    elif dictionary[file_name] < fileID:
        dictionary[file_name] = fileID


dictionary = collections.OrderedDict(sorted(dictionary.items()))

print(dictionary)

for x in dictionary.keys():
    print(f"{x}_{dictionary[x]}.txt")
  • Related