Is there a faster way to get the count of each team having the highest score?
Input
# Random scores per player per round (each player corresponds to an integer from 0 - 100)
scores_per_round = np.random.rand(10_000,100)
# 100,000 random teams of 8
teams = np.array([random.sample(list(range(100)), 8) for _ in range(100_000)])
Desired Output
# Count of top scores per team key being the index of the team in the teams array and the value being the amount of wins.
{
0: 20,
1: 12,
...
}
Currently I loop through the rounds, add up each teams score and then grab the maximum values index using np.argmax
and store the count in a dictionary.
import random
from collections import defaultdict
win_count = defaultdict(int)
# Random scores
scores_per_round = np.random.rand(10_000,100)
# 100,000 random teams of 8
teams = np.array([random.sample(list(range(100)), 8) for _ in range(100_000)])
# Loop through and keep track of teams wins
for round in range(10_000):
win_count[np.argmax(np.sum(np.take(scores_per_round[round], teams), axis=1))] = 1
CodePudding user response:
The initial code is slow because it allocates pretty-big temporary arrays. Iterating over them 10_000 is expensive because the RAM or the last-level cache are relatively slow (compared to the L1 cache or registers). Using Numba can fix this problem by computing the arrays on-the-fly in a more cache-friendly way.
Here is a simple parallel implementation:
import numpy as np
import numba as nb
@nb.njit('int32[::1](float64[:,::1], int32[:,::1])', parallel=True, fastmath=True)
def computeTeamWins(scores_per_round, teams):
roundCount = scores_per_round.shape[0]
result = np.empty(roundCount, dtype=np.int32)
n, m = teams.shape
assert m == 8 # See the comment below
for r in nb.prange(roundCount):
iMax, sMax = -1, -1.0
for i in range(n):
s = 0.0
# Faster if the size is known as the loop can be unrolled
for j in range(8):
s = scores_per_round[r, teams[i, j]]
if s > sMax:
iMax, sMax = i, s
result[r] = iMax
return result
win_count = defaultdict(int)
for v in computeTeamWins(scores_per_round, teams):
win_count[v] = 1
On my 6-core machine, it takes 0.8 second while the initial code takes 54.3 seconds. This means the Numba implementation is about 68 times faster. If you cast teams
to an np.uint8
array, then the computation takes only 0.57 seconds resulting in a 95 times faster execution (because of the caches). Note that this means the maximum integer value is bounded to 255 (included). Note that the final loop takes only few milliseconds.