Home > OS >  Combine two list of tuples depending on name
Combine two list of tuples depending on name

Time:11-08

I have two lists of tuples:

a=[(name_2,array_2),(name_7,array_7),...,(name_n,array_n)]
b=[(name_3,arr_3),(name_12,arr_12),...,(name_n,arr_n)]

I want to combine them depending on their name which is the first value of each tuple in each set. The lists are not sorted by name. The result shall look like:

combined=[(name_1,array_1,arr_1),(name_2,array_2,arr_2),...,(name_n,array_n,arr_n)]

Is there any more effective solution than iterating with two pointers?

CodePudding user response:

sorted_a = sorted(a, key=lambda x: x[0])
sorted_b = sorted(b, key=lambda x: x[0])

combined = [(sorted_a[idx][0], sorted_a[idx][1], sorted_b[idx][1]) for idx in range(len(a))]

CodePudding user response:

I assume the two lists have the same number of elements and the names are in the same order (by the way it looks from your example). In that case

[(x[0], x[1], y[1]) for x, y in zip(a, b)]

CodePudding user response:

Since you're saying they are pairs, most of the answers here are perfectly fine you just need to sort on the first item in the tuple before joining them together.

[(x[0], x[1], y[1]) for x, y in zip(sorted(a, key=lambda x: x[0]),
                                    sorted(b, key=lambda x: x[0]))]

CodePudding user response:

A solution to avoid sorting is to pass through dictionaries:

names = [x[0] for x in a]
ad = dict(a)
bd = dict(b)
combined = list({k: [ad[k], bd[k]] for k in names}.items())

On my machine, with vectors a and b of size 1000, this takes 384 us against 667 us when sorting the two arrays. NB: The final combined list is not sorted.

CodePudding user response:

With pandas

list(pd.concat([pd.DataFrame(a).set_index(0) for x in (a , b)],axis=1).reset_index().itertuples(index=False,name=None))
[('name_1', 'array_1', 'array_1'), ('name_2', 'array_2', 'array_2')]
  • Related