finding double entries in a list of tuples-CodePudding

I pull data from multiple excel and write it back to an aggregated excel file

so I have a list of tuples and each tuple consists of two values like this:

tuple = (entity-ID, debitor-name)
list = [tuple1, tuple2, ..., tupleN]

So it can happen that there are multiple entries with the same debitor-name but with different entity-ID. I want to find all entries where the debitor-name is equal and then merge the diffrent entity-IDs into a list. Context is that there are multiple entities within my company who can all have a credit relation to the same debitor-name. I hope this is understandable.

for deb in debitor_list:
    if deb not in agg_debitor_list:
        agg_debitor_list.append(deb)

This already filters for double entries within a certain entity so for example my debitor_list has following entries: [("1", "X AG"), ("1", "X AG"), ("1", "Z AG"), ("2", "X AG"), ("2", "X AG")] it gives me [("1", "X AG"), ("1", "Z AG"), ("2", "X AG")] as result I need something like this [(["1", "2"], "X AG"), (["1"], "Z AG")] to write in back in the aggregated excel file.

CodePudding user response：

You can now work with a dict that the key is debitor-name and the value is a list of entity-ID

agg_debitor_list =  [("1", "X AG"), ("1", "Z AG"), ("2", "X AG")]
debitor_to_ids = dict() 
for val, key in agg_debitor_list: 
    debitor_to_ids[key] = debitor_to_ids.get(key, []) 
    debitor_to_ids[key].append(val)
    
print(debitor_to_ids)

> {'X AG': ['1', '2'], 'Z AG': ['1']}

CodePudding user response：

This does what you want (no need to first create agg_debitor_list, redundancy is taken care of); note that for "Z AG", it returns ['1'] and not '1' like in your desired output, but this can be addressed if necessary.

debitor_list = [("1", "X AG"), ("1", "X AG"), ("1", "Z AG"), ("2", "X AG"), ("2", "X AG")]

def crunch(dlist):
    ddict = {}
    for tup in dlist:
        # if the key is not present in ddict, create it with an empty set for value
        if not ddict.get(tup[1]):
            ddict[tup[1]]= set()
        # adds the value to the set of values for that key (redundant values will be "merged")
        ddict[tup[1]].add(tup[0])
    # returns the dictionary in the desired format
    return [(list(b),a) for a,b in ddict.items()]

crunch(debitor_list)
# [(['1', '2'], 'X AG'), (['1'], 'Z AG')]