Home > Software engineering >  Remove file name duplicates in a list
Remove file name duplicates in a list

Time:12-10

I have a list l:

l = ['Abc.xlsx', 'Wqe.csv', 'Abc.csv', 'Xyz.xlsx']

In this list, I need to remove duplicates without considering the extension. The expected output is below.

l = ['Wqe.csv', 'Abc.csv', 'Xyz.xlsx']

I tried:

l = list(set(x.split('.')[0] for x in l))

But getting only unique filenames without extension

How could I achieve it?

CodePudding user response:

You can use a dictionary comprehension that uses the name part as key and the full file name as the value, exploiting the fact that dict keys must be unique:

>>> list({x.split(".")[0]: x for x in l}.values())
['Abc.csv', 'Wqe.csv', 'Xyz.xlsx']

If the file names can be in more sophisticated formats (such as with directory names, or in the foo.bar.xls format) you should use os.path.splitext:

>>> import os
>>> list({os.path.splitext(x)[0]: x for x in l}.values())
['Abc.csv', 'Wqe.csv', 'Xyz.xlsx']

CodePudding user response:

If the order of the end result doesn't matter, we could split each item on the period. We'll regard the first item in the list as the key and then keep the item if the key is unique.

oldList = l
setKeys = set()
l = []
for item in oldList:
    itemKey = item.split(".")[0]
    if itemKey in setKeys:
        pass
    else:
        setKeys.add(itemKey)
        l.append(item)

CodePudding user response:

Try this

l = ['Abc.xlsx', 'Wqe.csv', 'Abc.csv', 'Xyz.xlsx']
for x in l:
    name = x.split('.')[0]
    find = 0
    for index,d in enumerate(l, start=0):
        txt = d.split('.')[0]
        if name == txt:
            find  = 1
            if find > 1:
                l.pop(index)
print(l)

CodePudding user response:

@Selcuk Definitely the best solution, unfortunately I don't have enough reputation to vote you answer,

but I would rather use el[:el.rfind('.')] as my dictionary key than os.path.splitext(x)[0] in order to handle the case where we have sophisticated formats in the name. that will give something like this list({x[:x.rfind('.')]: x for x in l}.values())

  • Related