How to sort a list of tuples by the first element in each tuple, and pick the tuple with the largest-CodePudding

Here I have a list of n k-tuples (Here I set n = 4, k = 5)

A = [(1, 3, 5, 6, 6), (0, 1, 2, 4, 5), (1, 9, 8, 3, 5), (0, 2, 3, 5, 7)]

I hope to sort these tuples by their first element, so it will be 2 groups. And in each group, I want to select only 1 tuple whose last element is the largest. So in this situation, I hope my output of the function to be a list of tuple, such as

[(1, 3, 5, 6, 6),
 (0, 2, 3, 5, 7)]

Below is my attempt, and it seems it does not work well

import pandas as pd
import numpy as np

def f (sample):

    data = pd.DataFrame(sample)
    grouped_data = data.groupby(0)
    maximums = grouped_data.max(4)
    result = list(maximums.to_records(index = False))
    
    return result

I want to know if this could be accomplished by writing a dict? If so, how? Any hint or help is welcome.

CodePudding user response：

You can use itertools.groupby for this:

import itertools


def by_first_element(t):
    return t[0]


def by_last_element(t):
    return t[-1]


sorted_A = sorted(A, key=by_first_element)
groups = [[*g] for _, g in itertools.groupby(sorted_A, key=by_first_element)]
max_of_each_group = [max(g, key=by_last_element) for g in groups]

Output:

[(0, 2, 3, 5, 7), (1, 3, 5, 6, 6)]

Alternatively, yes, you can use a dictionary:

groups = {}
for t in A:
    groups[t[0]] = groups.get(t[0], [])   [t]

max_of_each_group = [max(g, key=lambda t: t[-1]) for g in groups.values()]

If you want max_of_each_group sorted, then

>>> sorted(max_of_each_group, key=lambda t: t[0])
[(0, 2, 3, 5, 7), (1, 3, 5, 6, 6)]

CodePudding user response：

This is trivial to accomplish with a dict. In fact, since you are going to do a reduction operation on the group, you can do this quite space-efficiently doing the reduction at each step:

>>> A = [(1, 3, 5, 6, 6), (0, 1, 2, 4, 5), (1, 9, 8, 3, 5), (0, 2, 3, 5, 7)]
>>> result = {}
>>> for tup in A:
...     first = tup[0]
...     result[first] = max(tup, result.get(first, tup), key=lambda x:x[-1])
...
>>> result
{1: (1, 3, 5, 6, 6), 0: (0, 2, 3, 5, 7)}
>>> list(result.values())
[(1, 3, 5, 6, 6), (0, 2, 3, 5, 7)]

Another valid approach is to do the grouping step first then a reduction step, this is probably more generalizable:

>>> result = {}
>>> grouper = {}
>>> for tup in A:
...     grouper.setdefault(tup[0],[]).append(tup)
...
>>> grouper
{1: [(1, 3, 5, 6, 6), (1, 9, 8, 3, 5)], 0: [(0, 1, 2, 4, 5), (0, 2, 3, 5, 7)]}

And to reduce:

>>> {k: max(v, key=lambda x:x[-1]) for k,v in grouper.items()}
{1: (1, 3, 5, 6, 6), 0: (0, 2, 3, 5, 7)}

CodePudding user response：

You can also use pandas as below:

import pandas as pd
A = [(1, 3, 5, 6, 6), (0, 1, 2, 4, 5), (1, 9, 8, 3, 5), (0, 2, 3, 5, 7)]

df = pd.DataFrame(A)
max_values = df.groupby(0)[4].max().values
l = df[df.loc[:,4].isin(max_values)].values
ans = [tuple(lst) for lst in l]
print(ans)
#[(1, 3, 5, 6, 6), (0, 2, 3, 5, 7)]

here, each row in df will be a tuple and each column will be position of an element in a tuple. So if we apply groupby method on df by 0, we are grouping each tuple by the first elements and then we find the max values in the fifth column, i.e. last element.

The next line is where we find the rows where the max values are equal to the max values we found in the previous line.

Finally, we convert the pandas DataFrame back to a list of tuples in the next line.