I am turning some codes from Matlab to Python. I am sometimes quite surprised by the performance loss. Here is an example on sorting arrays, which turns me nuts.
Matlab :
a=rand(50000,1000);tic;b=sort(a,1);toc
Elapsed time is 0.624460 seconds.
Python :
import numpy as np
import time
a=np.random.rand(50000,1000);
t0=time.time();b=np.sort(a,axis=0);print(time.time()-t0)
4.192200422286987
Can someone explain why there is a factor 7 in performance for such a basic operation? I see that sort is not multi-threaded on Python and this should be the main reason on my 20 cores machine.
For now I tried (following this link):
sudo apt update
sudo apt install intel-mkl-full
conda install -c intel numpy
But this did not change the behavior. In a terminal I also typed
export MKL_NUM_THREADS=20
export NUMEXPR_NUM_THREADS=20
export OMP_NUM_THREADS=20
In Python, the following command
np.show_config()
returns
blas_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/pierre/anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/pierre/anaconda3/include']
blas_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/pierre/anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/pierre/anaconda3/include']
lapack_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/pierre/anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/pierre/anaconda3/include']
lapack_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/pierre/anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/pierre/anaconda3/include']
Which seems to indicate that I am really using MKL. Is there a way to have np.sort work in parallel for arrays?
CodePudding user response:
After spending a few hours and checking with colleagues, the solution is now clear:
np.sort is not multi-threaded and there is no way to accelerate it.
It suffices to look at the sources to check this:
https://github.com/numpy/numpy/tree/main/numpy/core/src/npysort
For such an important function, it is surprising to me. Like 99.9% of the codes using sorting with np could be accelerated. I guess that I will implement my own sorting function with Cython.
Best,
Pierre