Comportement of list comprehension with self reference-CodePudding

I'm retrieving a list of (name, id) pairs and I need to make sure there's no duplicate of name, regardless of the id.

# Sample data
filesID = [{'name': 'file1', 'id': '353'},{'name': 'file2', 'id': '154'},{'name': 'file3', 'id': '1874'},{'name': 'file1', 'id': '14'}]

I managed to get the desired output with nested loops :

uniqueFilesIDLoops = []
for pair in filesID:
    found = False
    for d in uniqueFilesIDLoops:
        if d['name'] == pair['name']:
            found = True
    if not found:
        uniqueFilesIDLoops.append(pair)

But I can't get it to work with list comprehension ... Here's what I've tried so far :

uniqueFilesIDComprehension = []
uniqueFilesIDComprehension = [pair for pair in filesID if pair['name'] not in [d['name'] for d in uniqueFilesIDComprehension]]

Outputs :

# Original data
[{'name': 'file1', 'id': '353'}, {'name': 'file2', 'id': '154'}, {'name': 'file3', 'id': '1874'}, {'name': 'file1', 'id': '14'}]
# Data obtained with list comprehension
[{'name': 'file1', 'id': '353'}, {'name': 'file2', 'id': '154'}, {'name': 'file3', 'id': '1874'}, {'name': 'file1', 'id': '14'}]
# Data obtained with loops (and desired output)
[{'name': 'file1', 'id': '353'}, {'name': 'file2', 'id': '154'}, {'name': 'file3', 'id': '1874'}]

I was thinking that maybe the call to uniqueFilesIDComprehension inside the list comprehension was not updated at each iteration, thus using [] and not finding corresponding values...

CodePudding user response：

You cannot access contents of list comprehension during its creation, because it will be assigned to anything only after its value is completely evaluated.

Simpliest way to remove duplicates would be:

list({el['name'] : el for el in filesID}.values()) - this will create a dictionary based on the names of each element, so every time you encounter duplicate name it will overwrite it with a new element. After the dict is created all you need to do is get the values and cast it to list. If you want to keep the first element with each name, not the last you can instead do it by creating the dictionary in a for loop:

out = {}
for el in filesID:
    if el['name'] not in out:
        out[el['name']] = el

And finally, one thing to consider when implementing any of those solutions - since you do not care about id part, do you really need to extract it?

I'd ask myself if this is not a valid choice as well.

out = {el['name'] for el in filesID}
print(out)

Output: {'file1', 'file3', 'file2'}

CodePudding user response：

I would stick with your original loop, although note that it can be made a little cleaner. Namely, you don't need a flag named found.

uniqueFilesIDLoops = []
for pair in filesID:
    for d in uniqueFilesIDLoops:
        if d['name'] == pair['name']:
            break
    else:
        uniqueFilesIDLoops.append(pair)

You can also use an auxiliary set to simplify detecting duplicate names (since they are str values and therefore hashable).

seen = set()
uniqueFilesIDLoops = []
for pair in filesID:
    if (name := pair['name']) not in seen:
        seen.add(name)
        uniqueFilesIDLoops.append(pair)

Because we've now decoupled the result from the data structure we perform lookups in, the above could be turned into a list comprehension by writing an expression that both returns True when the name is not in the set and adds the name to the set. Something iffy like

seen = set()
uniqueFilesIDLoops = [pair 
                      for pair in filesID
                      if (pair['name'] not in seen
                          and (seen.add(pair['name']) or True))]

(seen.add always returns None, which is a falsey value, so seen.add(...) or True is always True.)

CodePudding user response：

List comprehensions are used to create new lists, so the original list is never updated; the assignment causes the variable to refer to the newly created list.