Home > OS >  Why sublist access time increases with sublist size?
Why sublist access time increases with sublist size?

Time:11-10

The code below initializes a list of random integers, and iterates over it. Given a subset_size, at every iteration i, a sublist of i: i subset_size is accessed. The time to access the sublist grows with subset_size. For n = 100000 and subset_size = 50000, it takes 15 seconds on my i5 mbp. I thought sublists are retrieved using 2 pointers and lazy evaluation but it looks like there's some c loop behind the scenes that populates a new list and returns it as a result. Is this a proper description to what actually happens or is there another explanation?

import random
from datetime import timedelta
from time import perf_counter


def example(n, subset_size):
    x = [random.randint(0, 10000) for _ in range(n)]
    t = perf_counter()
    for i in range(n - subset_size):
        _ = x[i : i   subset_size]
    print(timedelta(seconds=perf_counter() - t))


if __name__ == '__main__':
    example(100000, 50000)

0:00:15.131059

CodePudding user response:

I thought sublists are retrieved using 2 pointers and lazy evaluation but it looks like there's some c loop behind the scenes that populates a new list and returns it as a result.

Your assumption is correct. slicing a list always creates new list. Here is the relevant part of the source code. I have added some comments to understand what is being going on each steps.

static PyObject *
list_slice(PyListObject *a, Py_ssize_t ilow, Py_ssize_t ihigh)
{
    PyListObject *np;
    PyObject **src, **dest;
    Py_ssize_t i, len;
    len = ihigh - ilow;
    if (len <= 0) {
        return PyList_New(0);
    }
    # create new list which is long enough to hold the slice length elements.
    np = (PyListObject *) list_new_prealloc(len);
    if (np == NULL)
        return NULL;
    # Adjust the pointer offset, because list internally uses an array of pointers.
    src = a->ob_item   ilow;
    dest = np->ob_item;
    # Copy the elements back.
    for (i = 0; i < len; i  ) {
        PyObject *v = src[i];
        Py_INCREF(v);
        dest[i] = v;
    }
    Py_SET_SIZE(np, len);
    return (PyObject *)np;
}

As you can see when you slice a list it has to first call list_new_prealloc which is where an empty list is created and allocates the memory upto the slice length.

static PyObject *
list_new_prealloc(Py_ssize_t size)
{
    assert(size > 0);
    # Create new list
    PyListObject *op = (PyListObject *) PyList_New(0);
    if (op == NULL) {
        return NULL;
    }
    assert(op->ob_item == NULL);
    # Allocating memory
    op->ob_item = PyMem_New(PyObject *, size);
    if (op->ob_item == NULL) {
        Py_DECREF(op);
        return PyErr_NoMemory();
    }
    op->allocated = size;
    return (PyObject *) op;
}

CodePudding user response:

List slicing is not evaluated lazily, and indeed creates a new list object; see for example An Informal Introduction to Python, section 3.1.3. Lists:

All slice operations return a new list containing the requested elements.

CodePudding user response:

List slice would not be evaluated lazily, the list will be created on every iteration. Use itertools.islice to create lazy slice: islice(x, i, i subset_size)

  • Related