Why is accessing elements using `tolist` faster than accessing it directly through the pandas series-CodePudding

I have a dataframe and I wanted to apply a certain function on a set of columns. Something like:

data[["A","B","C","D","E"]].apply(some_func, axis=1)

In the some_func function, the first step is extracting out all the column values into separate variables.

def some_func(x):
    a,b,c,d,e = x # or x.tolist()
    #Some more processing

To reproduce, the result, use

x = pd.Series([1,2,3,4,5], index=["A","B","C","D","E"])

Now, my question is, why does

%%timeit 
a,b,c,d,e = x.tolist()

Output:

538 ns ± 2.82 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

perform better than

%%timeit 
a,b,c,d,e = x

Output:

1.61 µs ± 15.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

CodePudding user response：

Let's define two functions and inspect them with dis:

from dis import dis
from pandas import Series

x = Series([1,2,3,4,5], index=["A","B","C","D","E"])

def a():
   a, b, c, d, e = x.tolist()

def b():
   a, b, c, d, e = x

dis(a)
dis(b)

Executing the above will yield:

# dis(a)
  7           0 LOAD_GLOBAL              0 (x)
              2 LOAD_METHOD              1 (tolist)
              4 CALL_METHOD              0
              6 UNPACK_SEQUENCE          5
              8 STORE_FAST               0 (a)
             10 STORE_FAST               1 (b)
             12 STORE_FAST               2 (c)
             14 STORE_FAST               3 (d)
             16 STORE_FAST               4 (e)
             18 LOAD_CONST               0 (None)
             20 RETURN_VALUE

# dis(b)
 10           0 LOAD_GLOBAL              0 (x)
              2 UNPACK_SEQUENCE          5
              4 STORE_FAST               0 (a)
              6 STORE_FAST               1 (b)
              8 STORE_FAST               2 (c)
             10 STORE_FAST               3 (d)
             12 STORE_FAST               4 (e)
             14 LOAD_CONST               0 (None)
             16 RETURN_VALUE

From the above, it seems that, if anything, function (a) has more instructions. So why is it faster?

As explained in this answer, looking at the contents of UNPACK_SEQUENCE, one can see that there are some special-cases, such as when the number of left-hand side variables is equal to the length of the right-hand side object.

So, x.tolist() under the hood uses numpy method to create a list from the array data, which allows making use of the optimization for this special case (you can check the deterioration in performance by changing the number of arguments on the left-hand side).

When the right-hand side object is not a Python tuple or a list, then Python iterates over the contents of the object, which appears to be less efficient.

CodePudding user response：

Wow I love such questions! First, let us check the significance again (I don't believe stuff I don't have checked myself):

from timeit import timeit

setup = """
import pandas as pd
import numpy as np

def row_to_list(x):
    a, b, c, d, e = x
    return [a, b, c, d, e]

df1 = pd.DataFrame(np.random.rand(2000, 5))
"""
num = 10000

codes = ['lambda x: x.tolist()',
        'pd.Series.tolist',
        'row_to_list']

for code in codes:
    fnc_str = f'df1.apply({code}, axis=1)'
    t = timeit(fnc_str, setup=setup, number=num)
    print(f'{fnc_str}: {t}')

output:

df1.apply(lambda x: x.tolist(), axis=1): 111.2637004610151 df1.apply(pd.Series.tolist, axis=1): 108.36258125200402 df1.apply(row_to_list, axis=1): 141.3846389260143

OK, indeed there is a clear difference! So let's profile the function calls:

import cProfile
import pandas as pd
import numpy as np

def row_to_list(x):
    a, b, c, d, e = x
    return [a, b, c, d, e]

df1 = pd.DataFrame(np.random.rand(2000, 5))

codes = ['lambda x: x.tolist()',
        'pd.Series.tolist',
        'row_to_list']

for code in codes:
    fnc_str = f'df1.apply({code}, axis=1)'
    cProfile.run(fnc_str)

The output is actually way too long to post it here, but from the first row of the profile alone, you can see that under the hood, python is conducting 44678 (primitive) function calls in the first two cases (calling the method tolist()) and 52678 ( 18%!) with unraveling the row to a list "manually".

Well, this is the magic of numpy but I can't pin it down to a more specific reason...