The code below initializes a list of random integers, and iterates over it. Given a subset_size
, at every iteration i
, a sublist of i: i subset_size
is accessed. The time to access the sublist grows with subset_size
. For n = 100000
and subset_size = 50000
, it takes 15 seconds on my i5 mbp. I thought sublists are retrieved using 2 pointers and lazy evaluation but it looks like there's some c
loop behind the scenes that populates a new list and returns it as a result. Is this a proper description to what actually happens or is there another explanation?
import random
from datetime import timedelta
from time import perf_counter
def example(n, subset_size):
x = [random.randint(0, 10000) for _ in range(n)]
t = perf_counter()
for i in range(n - subset_size):
_ = x[i : i subset_size]
print(timedelta(seconds=perf_counter() - t))
if __name__ == '__main__':
example(100000, 50000)
0:00:15.131059
CodePudding user response:
I thought sublists are retrieved using 2 pointers and lazy evaluation but it looks like there's some c loop behind the scenes that populates a new list and returns it as a result.
Your assumption is correct. slicing a list always creates new list. Here is the relevant part of the source code. I have added some comments to understand what is being going on each steps.
static PyObject *
list_slice(PyListObject *a, Py_ssize_t ilow, Py_ssize_t ihigh)
{
PyListObject *np;
PyObject **src, **dest;
Py_ssize_t i, len;
len = ihigh - ilow;
if (len <= 0) {
return PyList_New(0);
}
# create new list which is long enough to hold the slice length elements.
np = (PyListObject *) list_new_prealloc(len);
if (np == NULL)
return NULL;
# Adjust the pointer offset, because list internally uses an array of pointers.
src = a->ob_item ilow;
dest = np->ob_item;
# Copy the elements back.
for (i = 0; i < len; i ) {
PyObject *v = src[i];
Py_INCREF(v);
dest[i] = v;
}
Py_SET_SIZE(np, len);
return (PyObject *)np;
}
As you can see when you slice a list it has to first call list_new_prealloc
which is where an empty list is created and allocates the memory upto the slice length.
static PyObject *
list_new_prealloc(Py_ssize_t size)
{
assert(size > 0);
# Create new list
PyListObject *op = (PyListObject *) PyList_New(0);
if (op == NULL) {
return NULL;
}
assert(op->ob_item == NULL);
# Allocating memory
op->ob_item = PyMem_New(PyObject *, size);
if (op->ob_item == NULL) {
Py_DECREF(op);
return PyErr_NoMemory();
}
op->allocated = size;
return (PyObject *) op;
}
CodePudding user response:
List slicing is not evaluated lazily, and indeed creates a new list object; see for example An Informal Introduction to Python, section 3.1.3. Lists:
All slice operations return a new list containing the requested elements.
CodePudding user response:
List slice would not be evaluated lazily, the list will be created on every iteration. Use itertools.islice
to create lazy slice:
islice(x, i, i subset_size)