Why does my python function run faster than the one in c ?-CodePudding

I have been writing a simple test to compare the speed improvements of c over python. My results were unexpected with c being almost trice as slow as the program written in python. I guess there is something in the loop inside the python function using iterators or something.

C code

#include <ctime>
#include <chrono>
using namespace std;
using namespace chrono;

void mult(int* a, int b)
{
    for (size_t i = 0; i < 100000000; i  )
    {
        a[i] *= b;
    }
}

int main()
{
    srand(time(0));
    int* a = new int[100000000];
    int l = 100000000;
    for (int i = 0; i < l;   i)
    {
        a[i] = rand();
    }
    auto start = high_resolution_clock::now();
    mult(a, 5);
    auto stop = high_resolution_clock::now();
    auto duration = duration_cast<milliseconds>(stop - start);
    cout << duration.count() << endl;
    delete[] a;
}

Python code

import time

def mult(x, a):
    for i in [x]:
        i *= a

x = np.random.random(100000000)
start = time.time() 
mult(x, 5)
elapsed = time.time()
elapsed = elapsed - start
print ("Time spent in (mult) is: ", elapsed)

Results

c (debug): 200 milliseconds

c (release): 50 milliseconds

Python (debug): 65 milliseconds

Python (release): 50 milliseconds

CodePudding user response：

There are many reasons why this performance test does not give useful results.

Don't compare, or pay attention to, release timing. The entire point of using a language like C or C is to enable (static) compiler optimizations. So really, the results are the same. On the other hand, it is important to make sure that aggressive compiler optimizations don't optimize out your entire test (due to the result of computation going unused, or due to undefined behaviour anywhere in your program, or due to the compiler assuming that part of the code can't actually be reached because it there would be undefined behaviour if it were reached).
for i in [x]: is a pointless loop: it creates a Python list of one element, and iterates once. That one iteration does i *= a, i.e., it multiplies i, which is the Numpy array. The code only works accidentally; it happens that Numpy arrays specially define * to do a loop and multiply each element. Which brings us to...
The entire point of using Numpy is that it optimizes number-crunching by using code written in C behind the scenes to implement algorithms and data structures. i simply contains a pointer to a memory allocation that looks essentially the same as the one the C program uses, and i *= a does a few O(1) checks and then sets up and executes a loop that looks essentially the same as the one in the C code.
This is not reliable timing methodology, in general. That is a whole other kettle of fish. The Python standard library includes a timeit module intended to make timing easier and help avoid some of the more basic traps. But doing this properly in general is a research topic beyond the scope of a Stack Overflow question.

"But I want to see the slow performance of native Python, rather than Numpy's optimized stuff - "

If you just want to see the slow performance of Python iteration, then you need for the loop to actually iterate over the elements of the array (and write them back):

def mult(x, a):
    for i in range(len(x)):
        x[i] *= a

Except that experienced Pythonistas won't write the code that way, because range(len( is ugly. The Pythonic approach is to create a new list:

def mult(x, a):
    return [i*a for i in x]

That will also show you the inefficiency of native Python data structures (we need to create a new list, which contains pointers to int objects).

On my machine, it is actually even slower to process the Numpy array this way than a native Python list. This is presumably because of the extra work that has to be done to interface the Numpy code with native Python, and "box" the raw integer data into int objects.

CodePudding user response：

Your python code does not the same thing. It does not iterate all elements of x, it rather iterate over list (of size 1) containing 1 element: the x (list). List of one list. Each iteration of such a loop takes x by reference and does *= a which just a resize of your x by a times.
If you modify your python code like this to actually iterate over elements in x:

for i in x:
  i *= a

then you will end up with code that does not store result of multiplication. In this python loop you multiply i by a and the result of * is thrown away. So python optimizer/compiler might throw your mult() function away completely and program will have the same effect. Even if Python does execute this loop, the absence of result storing saves a lot CPU cycles for you, because you just multiply one CPU register by another and do not send result to memory, so no memory writes, no cache misses, no cache invalidation and so on.