I have tested the timing of the built-in np.gradient function with respect to an hard-coded function which compute the gradient using the same central difference scheme. By using the cProfile module I have found significantly smaller time of np.gradient. This suggest that my implementation of the gradient computation in Python is extremely naive. I would like to understand how np.gradient is implement so to understand how to properly implement math operation in python. (I am aware that the hard-coded function does not exactly compute the gradient as np.gradient, since the boundary values are not included, but this should not affect the results).
Here is the python code I have used for the test:
import numpy as np
import cProfile
# Create ndarray
N = 1000
f = np.empty((N,N))
# Function to test the built-in gradient function of numpy
def test_grad_np(T, n):
for s in range(n):
dTdx, dTdy = np.gradient(T)
# Hard-coded gradient function
def hc_grad(T):
Ny, Nx = np.shape(T)
dTdx = np.zeros((Ny, Nx))
dTdy = np.zeros((Ny, Nx))
for j in range(1,Nx-1):
for i in range(1,Nx-1):
dTdx[j,i] = (T[j,i 1] - T[j,i-1])/2.
dTdy[j,i] = (T[j 1,i] - T[j-1,i])/2.
return dTdx, dTdy
# Function to test the hard-coded gradient function
def test_hc_grad(T, n):
for s in range(n):
dTdx, dTdy = hc_grad(T)
cProfile.run('test_hc_grad(T, 20)')
cProfile.run('test_grad_np(T, 20)')
and this is the output on my machine:
144 function calls in 16.818 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
20 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(shape)
1 0.001 0.001 16.818 16.818 <string>:1(<module>)
20 0.000 0.000 0.000 0.000 fromnumeric.py:1922(_shape_dispatcher)
20 0.000 0.000 0.000 0.000 fromnumeric.py:1926(shape)
20 16.803 0.840 16.804 0.840 test_speed.py:20(hc_grad)
1 0.013 0.013 16.817 16.817 test_speed.py:30(test_hc_grad)
1 0.000 0.000 16.818 16.818 {built-in method builtins.exec}
20 0.000 0.000 0.000 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
40 0.001 0.000 0.001 0.000 {built-in method numpy.zeros}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
704 function calls (624 primitive calls) in 0.262 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
40 0.000 0.000 0.001 0.000 <__array_function__ internals>:2(empty_like)
20 0.000 0.000 0.248 0.012 <__array_function__ internals>:2(gradient)
40 0.000 0.000 0.001 0.000 <__array_function__ internals>:2(ndim)
1 0.001 0.001 0.262 0.262 <string>:1(<module>)
20 0.000 0.000 0.000 0.000 _asarray.py:110(asanyarray)
40 0.000 0.000 0.000 0.000 _asarray.py:23(asarray)
40 0.000 0.000 0.000 0.000 fromnumeric.py:3102(_ndim_dispatcher)
40 0.000 0.000 0.001 0.000 fromnumeric.py:3106(ndim)
40 0.000 0.000 0.000 0.000 function_base.py:798(_gradient_dispatcher)
20 0.245 0.012 0.247 0.012 function_base.py:803(gradient)
40 0.000 0.000 0.000 0.000 multiarray.py:75(empty_like)
40 0.000 0.000 0.000 0.000 numerictypes.py:285(issubclass_)
20 0.000 0.000 0.000 0.000 numerictypes.py:359(issubdtype)
1 0.014 0.014 0.261 0.261 test_speed.py:15(test_grad_np)
1 0.000 0.000 0.262 0.262 {built-in method builtins.exec}
60 0.000 0.000 0.000 0.000 {built-in method builtins.issubclass}
40 0.000 0.000 0.000 0.000 {built-in method builtins.len}
60 0.000 0.000 0.000 0.000 {built-in method numpy.array}
100/20 0.001 0.000 0.248 0.012 {built-in method numpy.core._multiarray_umath.implement_array_function}
40 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
CodePudding user response:
If you're really curious, why not have a look at the np.gradient
source code?
I'm not an expert on np.gradient
, but just from playing around with your current implementation, a lot can be improved by using vectorial code. I believe the following should be equivalent:
def hc_grad(T):
dTdx = np.zeros_like(T)
dTdy = np.zeros_like(T)
dTdx[1:-1, 1:-1] = (T[1:-1, 2:] - T[1:-1, :-2]) / 2
dTdy[1:-1, 1:-1] = (T[2:, 1:-1] - T[:-2, 1:-1]) / 2
which already runs a lot faster on my computer.