Comparing performance of python and ctypes equivalent code-CodePudding

I tried to compare the performance of both Python and Ctypes version of the sum function. I found that Python is faster than ctypes version.

Sum.c file:

int our_function(int num_numbers, int *numbers) {
    int i;
    int sum = 0;
    for (i = 0; i < num_numbers; i  ) {
        sum  = numbers[i];
    }
    return sum;
}

int our_function2(int num1, int num2) {
    return num1   num2;
}

I compiled it to a shared library:

gcc -shared -o Sum.so sum.c

Then I imported both the shared library and ctypes to use the C function. Below Sum.py:

import ctypes

_sum = ctypes.CDLL('.\junk.so')

_sum.our_function.argtypes = (ctypes.c_int, ctypes.POINTER(ctypes.c_int))

def our_function_c(numbers):
    global _sum
    num_numbers = len(numbers)
    array_type = ctypes.c_int * num_numbers
    result = _sum.our_function(ctypes.c_int(num_numbers), array_type(*numbers))
    return int(result)

def our_function_py(numbers):
    sum = 0
    for i in numbers:
        sum  = i
    return sum


import time
start = time.time()
print(our_function_c([1, 2, 3]))
end = time.time()
print("time taken C", end-start)

start1 = time.time()
print(our_function_py([1, 2, 3]))
end1 = time.time()
print("time taken py", end1-start1)

Output:

6
time taken C 0.0010006427764892578
6
time taken py 0.0

For larger list like list(range(int(1e5))):

start = time.time()
print(our_function_c(list(range(int(1e5)))))
end = time.time()
print("time taken C", end-start)

start1 = time.time()

print(our_function_py(list(range(int(1e5)))))
end1 = time.time()
print("time taken py", end1-start1)

Output:

704982704
time taken C 0.011005163192749023
4999950000
time taken py 0.00500178337097168

Question: I tried to use more numbers, but Python still beats ctypes in terms of performance. So my question is, is there a rule of thumb when I should move to ctypes over Python (in terms of the order of magnitude of code)? Also, what is the cost to convert Python to Ctypes, please?

CodePudding user response：

Why

Well, yes, in such a case, it is not really worth it. Because before calling the c function, you spend lot of time converting numbers in to c_int.

Which is not less expansive as an addition.

Usually we use ctypes when, either the data are generated on the C-side. Or when we generate them from python, but then use them for more than 1 simple operation.

Same with pandas

This is for example what happens with numpy or pandas. Two well known example of libraries in C (or compiled anyway) that allow huge time gain (in the order of 1000×), as long as data don't go back and forth between C space and python space.

Numpy is faster than list for many operations, for example. As long as you don't count data conversion for each atomic operation. Pandas often works with data read from CSV, by pandas. Data stays in pandas space.

import time
import pandas as pd

lst=list(range(1000000))
start1=time.time()
s1=0
for x in lst:
    s1 =x
end1=time.time()

start2=time.time()
df=pd.DataFrame({'x':lst})
middle2=time.time()
s2=df.x.sum()
end2=time.time()
print("python", s1, "t=", end1-start1)
print("pandas", s2, "t=", end2-start2, end2-middle2)

python 499999500000 t= 0.13175106048583984
pandas 499999500000 t= 0.35060644149780273 0.0020313262939453125

As you see pandas also is way slower than python by this standard. But way faster if you don't count data creation.

Faster without data conversion

Try to run your code this way

import time
lst=list(range(1000))*1000
c_lst = (ctypes.c_int * len(lst))(*lst)
c_num = ctypes.c_int(len(lst))
start = time.time()
print(int(_sum.our_function(c_num, c_lst)))
end = time.time()
print("time taken C", end-start)

start1 = time.time()
print(our_function_py(lst))
end1 = time.time()
print("time taken py", end1-start1)

And c code is way faster.

So, like with panda, it doesn't worth it, if really, all you need from it is to do one summation, and then forget the data.

No such problem with c-extension

Note that with python c-extension, that allows c functions to handle python types, you don't have this problem (yet, it is often less efficient, because, well, python arrays are not just int * like C loves. But at least, you don't need a conversion to C made from python) That is why you may sometimes see some libraries for which, even counting conversion, using external libraries is faster.

import numpy as np
np.array(lst).sum()

for example is slightly faster. But almost not so, when we are used to have numpy 1000× faster. Because numpy.array helps itself from python list data.

But that is not just ctypes, (by ctypes, I mean "using c-functions from the c-world handling c-data, not caring about python at all."). Plus, I am not even sure that this is the only reason. Numpy might be cheating, using several threads, and vectorization, which, neither python nor your c code does.

Example that needs no big data conversion

So, let's add another example, and add this to your code

int sumFirst(int n){
    int s=0;
    for(int i=0; i<n; i  ){
        s =i;
    }
    return s;
}

And try it with

import ctypes

_sum = ctypes.CDLL('./ctypeBench.so')

_sum.sumFirst.argtypes = (ctypes.c_int,)

def c_sumFirst(n):
    return _sum.sumFirst(ctypes.c_int(n))

import time
lst=list(range(10000))
start1=time.time()
s1=0
for x in lst:
    s1 =x
end1=time.time()

start2=time.time()
s2=c_sumFirst(10000)
end2=time.time()

print(f"python {s1=}, Δt={end1-start1}")
print(f"c {s2=}, Δt={end2-start2}")

Result is

python s1=49995000, Δt=0.0012884140014648438
c s2=49995000, Δt=4.267692565917969e-05

And note that I was fair to python: I did not count data generation in its time (I explicitly listed the range. Which doesn't change much).

So, conclusion is, you can't expect ctypes function to gain time for a single operation per data such as , when you need 1 conversion per data to use them.

Either you need to use c-extension and write ad-hoc code than handle a python list (and even there, you won't gain much if you have just one addition to do per value).

Or you need to keep the data on the c-side, creating them from c, and letting them there (like you do with pandas or numpy: you use dataframe or ndarrays, as much as possible with pandas and numpy functions or operator, not getting all of them in python with full indexation or .iloc).

Or you need to have really more than one addition per data to do.

Addendum: c-extension

Just to add another argument in favor of "problem is conversion", but also to explicit what to do if you really need to do one simple operation on a list, and don't want to convert every elements before, you can try this

modmy.c

#define PY_SSIZE_T_CLEAN
#include <Python.h>
#define PY3K

static PyObject *mysum(PyObject *self, PyObject *args){
    PyObject *list;
    PyArg_ParseTuple(args, "O", &list);
    PyObject *it = PyObject_GetIter(list);
    long int s=0;
    for(;;){
        PyObject *v = PyIter_Next(it);
        if(!v) break;
        long int iv=PyLong_AsLong(v);
        s =iv;
    }
    return PyLong_FromLong(s);
}

static PyMethodDef MyMethods[] = {
    {"mysum",  mysum, METH_VARARGS, "Sum list"},
    {NULL, NULL, 0, NULL}        /* Sentinel */
};

static struct PyModuleDef modmy = {
    PyModuleDef_HEAD_INIT,
    "modmy",   
    NULL, 
    -1,       
    MyMethods
};

PyMODINIT_FUNC
PyInit_modmy()
{
    return PyModule_Create(&modmy);
}

Compile with

gcc -fPIC `python3-config --cflags` -c modmy.c
gcc -shared -fPIC `python3-config --ldflags` -o modmy.so modmy.o

Then

import time
import modmy

lst=list(range(10000000))

start1=time.time()
s1=0
for x in lst:
    s1 =x
end1=time.time()

start2=time.time()
s2=modmy.mysum(lst)
end2=time.time()

print("python res=%d  t=%5.2f"%(s1, end1-start1))
print("c      res=%d  t=%5.2f"%(s2, end2-start2))

This time no need for conversion (or, to be more accurate, yes, there is still a need for conversion. But it is done by C code, since it is not any C code, but code made ad-hoc to extend python. (And after all, python interpreter, under the hood, also need to unpack the elements)

Note that my code checks nothing. It assumes that you are really calling mysum with a single argument being a list of integers. God knows what happens if you don't. Well, not just God. just try:

>>> import modmy
>>> modmy.mysum(12)
Segmentation fault (core dumped)
$

Python crashes (not just python's code. It is not a python error. The python process dies)

But result worth it

python res=49999995000000  t= 1.22
c      res=49999995000000  t= 0.11

So, you see, this times C wins. Because it is really the same rules (they are doing the same. Just C does it faster)

So, you need to know what you are doing. But well, this does what you expected: a very simple operation on a list of integers, that runs faster in C than in python.