Currently, I am working with video data and therefore I am performing statistical operations on multiple frames at once. During a debugging session I observed that the computation for numpy statistics (mean computation) in this case) over multiple axes takes longer when computed directly over the desired axes compared to computing it over each axis separately one after the other. I created a simple example to explain my observations.
from timeit import default_timer as timer
import numpy as np
rnd_frames = np.random.randn(100, 128, 128, 3)
n_reps = 1000
# -----------------------------------
# mean computation over multiple axes
# -----------------------------------
# all axes at once
ts = timer()
for i in range(n_reps):
mean_1 = np.mean(rnd_frames, axis=(1, 2))
print('Mean all at once: ', (timer()-ts)/n_reps)
# one after the other
ts = timer()
for i in range(n_reps):
mean_2 = np.mean(rnd_frames, axis=1)
mean_2 = np.mean(mean_2, axis=1)
print('Mean one after the other: ', (timer()-ts)/n_reps)
print('Difference in means: ', np.sum(np.abs(mean_1-mean_2)))
The difference is very small and results from float64 precision.
Does someone have an explanation for this? As the time differences are quite significant: One after the other is 10x faster
I wonder if this is some kind of bug? Can anyone explain this.
CodePudding user response:
The time for 2 axes is the same as for a 1 axis on the equivalent reshape:
In [7]: timeit mean_1 = np.mean(rnd_frames, axis=(1, 2))
54.2 ms ± 202 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [11]: timeit mean_3 = np.mean(rnd_frames.reshape(100,-1,3), axis=1)
54.5 ms ± 142 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [12]: rnd_frames.reshape(100,-1,3).shape
Out[12]: (100, 16384, 3)
As you note this is quite a bit larger than a sequential calculation:
In [13]: %%timeit
...: mean_2 = np.mean(rnd_frames, axis=1)
...: mean_2 = np.mean(mean_2, axis=1)
7.63 ms ± 49.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Without getting deep into the woods of compiled code it's hard to say why there this difference. While "avoiding loops" is a common performance strategy in numpy
, that applies mostly to "many loops on a simple task". A few loops on a complex task can be faster. I'm not sure that applies here, but I'm not surprised that there are differences like this.
We could also explore whether putting those 2 loops at the end (inner most), or beginning of the dimensions shows this difference or not.
edit
If I move the 2 axes to either the beginning, or end, the time difference is much smaller. There's something about having that small size 3 dimension at the end (inner most) that's making your example unusually slow.