Home > Blockchain >  How to group and aggregate two lists in Python or Scala?
How to group and aggregate two lists in Python or Scala?

Time:11-09

Given input lists:

L1 = [("A","p1",20), ("B","p2",30)]
L2 = [("A","p1",100), ("c","p3",35)]

Expected output:

[(A,p1,20,100), (B,p2,30,"not in L2"), ("c","p3",35,"not in L1")]

I have tried using two for loops one for L1 and other for L2 but it is not working for iterative elements and giving repeated output which is not needed.

CodePudding user response:

L1 = [("A","p1",20), ("B","p2",30)]
L2 = [("A","p1",100), ("c","p3",35)]

# tranform L1, L2 to dict
D1 = dict((_[0], _) for _ in L1)
D2 = dict((_[0], _) for _ in L2)
out_dict = dict()

for k, v in D1.items():
    if k in D2:
        # use set to avoid duplicated value
        new_tuple = D2[k]   v
        # keep the original order
        sorted_list = sorted(set(new_tuple), key=new_tuple.index)
        out_dict[k] = tuple(sorted_list)
    else:
        out_dict[k] = v   ("not in L2",)

for k, v in D2.items():
    if k in D1:
        new_tuple = D1[k]   v
        sorted_list = sorted(set(new_tuple), key=new_tuple.index)
        out_dict[k] = tuple(sorted_list)
    else:
        out_dict[k] = v   ("not in L1",)

output = list(v for k, v in out_dict.items())

Output:

[('A', 'p1', 20, 100),
 ('B', 'p2', 30, 'not in L2'),
 ('c', 'p3', 35, 'not in L1')]

CodePudding user response:

First, create a dictionary holding all possible grouping keys ("A","p1", "B","p2", etc.). Then, loop through keys in this dictionary to find if it exists in either of the lists.

L1 = [("A","p1",20), ("B","p2",30)]
L2 = [("A","p1",100), ("c","p3",35)]


d = {}
for x, y, z in L1   L2:
    d[(x, y)] = [z] if not d.get((x, y)) else d[(x, y)]   [z]
for k in d:
    if k not in {(x, y) for x, y, z in L1}:
        d[k]  = ["not in L1"]
    if k not in {(x, y) for x, y, z in L2}:
        d[k]  = ["not in L2"]
L = [(*k, *v) for k, v in d.items()]

print(L)
# [('A', 'p1', 20, 100), ('B', 'p2', 30, 'not in L2'), ('c', 'p3', 35, 'not in L1')]

This would be safe if you had doubled keys in one list:

L1 = [("A","p1",20), ("B","p2",30)]
L2 = [("A","p1",100), ("A","p1",200), ("c","p3",35)]

Then the result would be

# [('A', 'p1', 20, 100, 200), ('B', 'p2', 30, 'not in L2'), ('c', 'p3', 35, 'not in L1')]
  • Related