Home > Software design >  Merging ranked lists of tuples based on common id
Merging ranked lists of tuples based on common id

Time:09-17

I have the following sorted lists of tuples:

list1 = [(0.2, 'a'), (0.4, 'b'), (0.5,'d')]
list2 = [(0.1, 'a'), (0.3, 'c'), (0.7, 'x')]
list3 = [(0.5, 'c'), (0.6, 'a'), (0.5, 'b')]

I want to create an overall ranked list based on the common letters as follows:

  1. If the letter is common in all three lists, add the three individual values
  2. If the letter is only common between two lists, add the two individual values and a 1
  3. If the element is only in one list, add 2 to its value

Expected result:

[(0.9, 'a'), (1.8, 'c'), (1.9, 'b'), (2.5, 'd'), (2.7, 'x')]

What is working:

I am able to get the expected result if the item is common in all three lists but I am unable to get correct results if it's the other cases.

Code snippet

list1 = [(0.2, 'a'), (0.4, 'b'), (0.5, 'd')]
list2 = [(0.1, 'a'), (0.3, 'c'), (0.7, 'x')]
list3 = [(0.5, 'c'), (0.6, 'a'), (0.5, 'b')]
priority_result = [] # when element is common in all 3 lists
twos_array = [] #when element is common in only two lists

result = [(s1, l1   l1) for (l1, s1), (l1, s2) in zip(list1, list2)]
print(result)
for (score, resultID) in list1:
    for (score1, resultID1) in list2:
        for (score2, resultID2) in list3:
            if(resultID == resultID1 or resultID == resultID2):                    
                result = [(score   score1   score2, resultID)]
                priority_result.extend(result)
            elif(resultID == resultID1 and resultID != resultID2):
                result = [(score   score1   1, resultID)]
                twos_array.extend(result)

How can I work on this to produce the desired outcome?

CodePudding user response:

list1 = [(0.2, 'a'), (0.4, 'b'), (0.5, 'd')]
list2 = [(0.1, 'a'), (0.3, 'c'), (0.7, 'x')]
list3 = [(0.5, 'c'), (0.6, 'a'), (0.5, 'b')]

d = {}
for t in list1   list2   list3:
    d.setdefault(t[1], []).append(t[0])
lst = [(sum(v, 3 - len(v)), k) for k, v in d.items()]
print(lst)  # [(0.9, 'a'), (1.9, 'b'), (2.5, 'd'), (1.8, 'c'), (2.7, 'x')]

CodePudding user response:

You can swap the order of the tuples to create a mapping:

d1 = dict(x[::-1] for x in list1)
d2 = dict(x[::-1] for x in list2)
d3 = dict(x[::-1] for x in list3)

Now you can make a union of the keys, since dict.keys returns a set-like object:

keys = d1.keys() | d2.keys() | d3.keys()

The rest can be done with dict.get:

result = {k: d1.get(k, 1)   d2.get(k, 1)   d3.get(k, 1) for k in keys}

Turning this into a sorted list is straightforward:

sorted(x[::-1] for x in result.items())

Let's say your lists were in a meta-list now:

lists = [list1, list2, list3]
keys = set().union(*lists)
dicts = [dict(x[::-1] for x in l) for l in lists]
result = {k: sum(d.get(k, 1) for d in dicts) for k in keys}
result = sorted(x[::-1] for x in result.items())

Here is a slightly simpler solution:

mapping = dict.fromkeys(set().union(*lists), len(lists))
for v, k in itertools.chain.from_iterable(lists):
    mapping[k]  = v - 1
result = sorted(x[::-1] for x in result.items())

You can use collections.Counter to do most of the math for you:

c = Counter()
for lst in lists:
    c.update({k: v - 1 for v, k in lst})
result = [(v   len(lists), k) for k, v in c.items()]

The same thing with a regular collections.defaultdict is:

d = defaultdict(int)
for v, k in itertools.chain.from_iterable(lists):
    d[k]  = v - 1
result = [(v   len(lists), k) for k, v in d.items()]

CodePudding user response:

You could try using itertools.groupby with operator.itemgetter:

from itertools import groupby
from operator import itemgetter
x = list1   list2   list3
y = [l[1] for l in x]
print(sorted([((3 - y.count(key))   sum(next(zip(*l))), key) for key, l in groupby(sorted(x, key=ig(1)), key=ig(1))], key=ig(0)))

[(0.9, 'a'), (1.8, 'c'), (1.9, 'b'), (2.5, 'd'), (2.7, 'x')]

This code concatenates the lists and creates also another list for only the key, and groups by the keys, sums up the value. Also it adds the expected increase in the value according to the number of occurrences.

And finally it sorts by the summed and modified values.

  • Related