Home > Software engineering >  Numpy sort much slower than Matlab sort
Numpy sort much slower than Matlab sort

Time:12-16

I am turning some codes from Matlab to Python. I am sometimes quite surprised by the performance loss. Here is an example on sorting arrays, which turns me nuts.

Matlab :

a=rand(50000,1000);tic;b=sort(a,1);toc

Elapsed time is 0.624460 seconds.

Python :

import numpy as np
import time
a=np.random.rand(50000,1000);
t0=time.time();b=np.sort(a,axis=0);print(time.time()-t0)

4.192200422286987

Can someone explain why there is a factor 7 in performance for such a basic operation? I see that sort is not multi-threaded on Python and this should be the main reason on my 20 cores machine.

For now I tried (following this link):

sudo apt update
sudo apt install intel-mkl-full
conda install -c intel numpy 

But this did not change the behavior. In a terminal I also typed

export MKL_NUM_THREADS=20
export NUMEXPR_NUM_THREADS=20
export OMP_NUM_THREADS=20

In Python, the following command

np.show_config()

returns

blas_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/pierre/anaconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/pierre/anaconda3/include']
blas_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/pierre/anaconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/pierre/anaconda3/include']
lapack_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/pierre/anaconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/pierre/anaconda3/include']
lapack_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/pierre/anaconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/pierre/anaconda3/include']

Which seems to indicate that I am really using MKL. Is there a way to have np.sort work in parallel for arrays?

CodePudding user response:

After spending a few hours and checking with colleagues, the solution is now clear:

np.sort is not multi-threaded and there is no way to accelerate it.

It suffices to look at the sources to check this:

https://github.com/numpy/numpy/tree/main/numpy/core/src/npysort

For such an important function, it is surprising to me. Like 99.9% of the codes using sorting with np could be accelerated. I guess that I will implement my own sorting function with Cython.

Best,

Pierre

  • Related