Home > Enterprise >  Function that removes duplicates in a list of values assigned to each key in a dictionary?
Function that removes duplicates in a list of values assigned to each key in a dictionary?

Time:04-11

I want to make a function that first merges the duplicate entries by key in a dictionary, then removes the duplicate values in each key. However, I want the removed duplicates to be relative to the other values in the value list they’re in, not the value lists of the entire dictionary. If possible, could this be done using only for-loops without list comprehension?

An example input would look like

remove_value_duplicates(Stores)

where

Stores = [{'deli': ['beef', 'chicken', 'beef'], 'bakery': ['chicken']},
{'deli': ['chicken', 'chicken', 'beef'], 'bakery': ['chicken'],'meat_store':['beef']}]

and the output of this would be

{'deli': ['beef', 'chicken'], 'bakery': ['chicken'],'meat_store':['beef']}

This is what the process of the function should look like when broken down into steps.

  1. The function should first look for duplicate keys within the dictionary, and when it finds them, it merges them into a single key while carrying over the values from the duplicates. If there are no duplicates, then this step can be skipped.

(For instance, the key ‘deli’ appears multiple times in the dictionary, so its duplicates would be merged into ‘deli’:[‘beef’,’chicken’,’beef’,’chicken’,’chicken’,’beef’]. The same would occur for ‘bakery’, which becomes ‘bakery’:[‘chicken’,’chicken’]. ‘meat_store’ does not have any duplicates so nothing is merged and the values remain the same.)

  1. Afterwards, the function checks for duplicates in the list of values for each key. This shortens ‘deli’:[‘beef’,’chicken’,’beef’,’chicken’,’chicken’,’beef’] to ‘deli’:[‘beef’,’chicken’] and ‘bakery’:[‘chicken’,’chicken’] to ‘bakery’[‘chicken’]. ‘meat_store’ does not have any duplicate values so this does not apply to the key. Following this, the new dictionary is returned.

What I have tried doing

I wanted to try testing some variables to see if I could isolate and get rid of the duplicates in a list. The list below is different from the function I’m trying to define, but it assumes that the values from a dictionary have been extracted into the variable called wordlists. I wanted to see if it was possible to remove the duplicates from the sublist and then re-append the modified sublists to a larger list.

wordlists = [['meat', 'meat', 'cheese'],['onions']]

new_lists = []
new_sublists = []
for sublists in wordlists:
    for values in sublists:
        if values not in new_sublists:
            new_sublists.append(values)
new_sublists
new_lists.append(new_sublists)
new_lists

output:
[['meat', 'cheese', 'onions']]

while the process does remove the duplicate strings, it does not re-append the modified sublists as intended. The expected output I want should look like [['meat','cheese'],['onions']]. I planned on using lines of code similar to this when defining the function, however I’m not sure if it would work.

CodePudding user response:

Consider utilizing collections.defaultdict:

from collections import defaultdict

def remove_value_duplicates(stores: list[dict[str, list[str]]]) -> dict[str, list[str]]:
    merged_d = defaultdict(set)
    for d in stores:
        for key, vals in d.items():
            merged_d[key].update(vals)
    return {k: list(v) for k, v in merged_d.items()}

def main() -> None:
    stores = [{'deli': ['beef', 'chicken', 'beef'], 'bakery': ['chicken']}, {'deli': ['chicken', 'chicken', 'beef'], 'bakery': ['chicken'],'meat_store':['beef']}]
    new_stores = remove_value_duplicates(stores)
    print(new_stores)

if __name__ == '__main__':
    main()

Output:

{'deli': ['beef', 'chicken'], 'bakery': ['chicken'], 'meat_store': ['beef']}

Without any imports:

def remove_value_duplicates(stores: list[dict[str, list[str]]]) -> dict[str, list[str]]:
    merged_d = {}
    for d in stores:
        for key, vals in d.items():
            merged_d.setdefault(key, set()).update(vals)
    return {k: list(v) for k, v in merged_d.items()}

CodePudding user response:

The perfect data structure for your Dictionary values would be sets and not lists, as you don't want duplicates. So if that isn't a constraint now or in the future, I'd suggest you make that change.

But if it is a constraint, following could be your code:

def remove_value_duplicates(Stores):
    for Store in Stores:
        for k,v in Store.items():
            v_set = set()
            i = 0
            while i<len(v):
                item = v[i]
                if item in v_set:
                    v.pop(i)
                else:
                    v_set.add(item)
                    i =1
                
Stores = [{'deli': ['beef', 'chicken', 'beef'], 'bakery': ['chicken']}, {'deli': ['chicken', 'chicken', 'beef'], 'bakery': ['chicken'],'meat_store':['beef']}]
Store = Stores[1]
remove_value_duplicates(Stores)

print(Stores)

The output of the above code for you input is as follows:

[{'deli': ['beef', 'chicken'], 'bakery': ['chicken']}, {'deli': ['chicken', 'beef'], 'bakery': ['chicken'], 'meat_store': ['beef']}]
  • Related