Home > Enterprise >  Efficiently extracting the indices of the first two largest numbers from a python list/array with po
Efficiently extracting the indices of the first two largest numbers from a python list/array with po

Time:02-25

So I have the following case. I have a python list of 4 members: [A1,A2,B1,B2] and I have a function that calculates their value and returns a Numpy array of integers, e.g. [8,5,9,7]. I need to extract the two members with the highest value, where members of type A have a higher importance than type B (in case of a tie-break). In other words, I need the indices of the first two maximum values in the array. In the above example that would be [0,2]. I need these indices to select the members in the original list, in this case A1 and B1.

In the case of value_array = [8,8,4,8], the function should return [0,1]. The problem is trivial and could be solved easily (I also saw the same problem here: Write a function to find indices of first two largest numbers) but I am looking for the most computationally efficient implementation. Since I know that python is quite slow, I figured using built in Numpy functions is the way to go, to avoid slow loops. I used np.argmax(value_array) but it just returns the index of the first maximum value. What I could do is remove this element from the array and call the same function again, but I did not know if this was the fastest way to go, since I am then calling and calculating the maximum value twice as well as removing elements from the array of values.

Another idea I had was to create a class for the members to give them properties type and value. One could then check for the type property, but I felt like this would only make it slower and unnecessarily complicated (I don't know how fast property checking is in Python).

Does anyone have a clever solution to get the indices of the first two highest numbers in a Numpy integer array?

CodePudding user response:

You could use pandas for sorting and slicing

a = ['A1', 'A2', 'B1', 'B2']
b = [8, 5, 9, 2]
df = pd.DataFrame({"a": a, "b": b})
df.sort_values(['b', 'a'], ascending=False, inplace=True)
df.iloc[:2].index

CodePudding user response:

You can use np.argpartition to find the kth smalest value in an array (and respectively the (n-k)th biggest values). The resulting values are unordered. If you want them ordered, then you need to sort the resulting array of the biggest values. This solution is much faster than sorting the whole array (a sort runs in O(n log n) time while a partition is done in O(n) time).

a = np.array(['A1', 'A2', 'B1', 'B2'])
b = np.array([8, 5, 9, 7])

biggest_b_pos = np.argpartition(b, len(b)-2)[-2:]
result = a[biggest_b_pos]
  • Related