Is there a faster way to get the index of the maximum value and keep count?-CodePudding

Is there a faster way to get the count of each team having the highest score?

Input

# Random scores per player per round (each player corresponds to an integer from 0 - 100) 
scores_per_round = np.random.rand(10_000,100)

# 100,000 random teams of 8
teams = np.array([random.sample(list(range(100)), 8) for _ in range(100_000)])

Desired Output

# Count of top scores per team key being the index of the team in the teams array and the value being the amount of wins.
{
  0: 20,
  1: 12,
  ...
}

Currently I loop through the rounds, add up each teams score and then grab the maximum values index using np.argmax and store the count in a dictionary.

import random
from collections import defaultdict

win_count = defaultdict(int)

# Random scores
scores_per_round = np.random.rand(10_000,100)

# 100,000 random teams of 8
teams = np.array([random.sample(list(range(100)), 8) for _ in range(100_000)])

# Loop through and keep track of teams wins
for round in range(10_000):
    win_count[np.argmax(np.sum(np.take(scores_per_round[round], teams), axis=1))]  = 1

CodePudding user response：

The initial code is slow because it allocates pretty-big temporary arrays. Iterating over them 10_000 is expensive because the RAM or the last-level cache are relatively slow (compared to the L1 cache or registers). Using Numba can fix this problem by computing the arrays on-the-fly in a more cache-friendly way.

Here is a simple parallel implementation:

import numpy as np
import numba as nb

@nb.njit('int32[::1](float64[:,::1], int32[:,::1])', parallel=True, fastmath=True)
def computeTeamWins(scores_per_round, teams):
    roundCount = scores_per_round.shape[0]
    result = np.empty(roundCount, dtype=np.int32)
    n, m = teams.shape
    assert m == 8 # See the comment below

    for r in nb.prange(roundCount):
        iMax, sMax = -1, -1.0
        for i in range(n):
            s = 0.0
            # Faster if the size is known as the loop can be unrolled
            for j in range(8):
                s  = scores_per_round[r, teams[i, j]]
            if s > sMax:
                iMax, sMax = i, s
        result[r] = iMax

    return result

win_count = defaultdict(int)
for v in computeTeamWins(scores_per_round, teams):
    win_count[v]  = 1

On my 6-core machine, it takes 0.8 second while the initial code takes 54.3 seconds. This means the Numba implementation is about 68 times faster. If you cast teams to an np.uint8 array, then the computation takes only 0.57 seconds resulting in a 95 times faster execution (because of the caches). Note that this means the maximum integer value is bounded to 255 (included). Note that the final loop takes only few milliseconds.