- I pull data from multiple excel and write it back to an aggregated excel file
so I have a list of tuples and each tuple consists of two values like this:
tuple = (entity-ID, debitor-name)
list = [tuple1, tuple2, ..., tupleN]
So it can happen that there are multiple entries with the same debitor-name but with different entity-ID. I want to find all entries where the debitor-name is equal and then merge the diffrent entity-IDs into a list. Context is that there are multiple entities within my company who can all have a credit relation to the same debitor-name. I hope this is understandable.
for deb in debitor_list:
if deb not in agg_debitor_list:
agg_debitor_list.append(deb)
This already filters for double entries within a certain entity so for example my debitor_list has following entries:
[("1", "X AG"), ("1", "X AG"), ("1", "Z AG"), ("2", "X AG"), ("2", "X AG")]
it gives me [("1", "X AG"), ("1", "Z AG"), ("2", "X AG")]
as result I need something like this [(["1", "2"], "X AG"), (["1"], "Z AG")]
to write in back in the aggregated excel file.
CodePudding user response:
You can now work with a dict that the key is debitor-name and the value is a list of entity-ID
agg_debitor_list = [("1", "X AG"), ("1", "Z AG"), ("2", "X AG")]
debitor_to_ids = dict()
for val, key in agg_debitor_list:
debitor_to_ids[key] = debitor_to_ids.get(key, [])
debitor_to_ids[key].append(val)
print(debitor_to_ids)
> {'X AG': ['1', '2'], 'Z AG': ['1']}
CodePudding user response:
This does what you want (no need to first create agg_debitor_list, redundancy is taken care of); note that for "Z AG", it returns ['1'] and not '1' like in your desired output, but this can be addressed if necessary.
debitor_list = [("1", "X AG"), ("1", "X AG"), ("1", "Z AG"), ("2", "X AG"), ("2", "X AG")]
def crunch(dlist):
ddict = {}
for tup in dlist:
# if the key is not present in ddict, create it with an empty set for value
if not ddict.get(tup[1]):
ddict[tup[1]]= set()
# adds the value to the set of values for that key (redundant values will be "merged")
ddict[tup[1]].add(tup[0])
# returns the dictionary in the desired format
return [(list(b),a) for a,b in ddict.items()]
crunch(debitor_list)
# [(['1', '2'], 'X AG'), (['1'], 'Z AG')]