Home > database >  Where is the actual "sorted" method being implemented in CPython and what is it doing here
Where is the actual "sorted" method being implemented in CPython and what is it doing here

Time:11-28

Viewing the source code of CPython on GitHub, I saw the method here:
https://github.com/python/cpython/blob/main/Python/bltinmodule.c

And more specifically:

static PyObject *
builtin_sorted(PyObject *self, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames)
{
    PyObject *newlist, *v, *seq, *callable;

    /* Keyword arguments are passed through list.sort() which will check
       them. */
    if (!_PyArg_UnpackStack(args, nargs, "sorted", 1, 1, &seq))
        return NULL;

    newlist = PySequence_List(seq);
    if (newlist == NULL)
        return NULL;

    callable = _PyObject_GetAttrId(newlist, &PyId_sort);
    if (callable == NULL) {
        Py_DECREF(newlist);
        return NULL;
    }

    assert(nargs >= 1);
    v = _PyObject_FastCallKeywords(callable, args   1, nargs - 1, kwnames);
    Py_DECREF(callable);
    if (v == NULL) {
        Py_DECREF(newlist);
        return NULL;
    }
    Py_DECREF(v);
    return newlist;
}

I am not a C master, but I don't see any implementation of any of the known sorting algorithms, let alone the special sort that Python uses (I think it's called Timsort? - correct me if I'm wrong)

I would highly appreciate if you could help me "digest" this code and understand it, because as of right now I've got:

PyObject *newlist, *v, *seq, *callable;

Which is creating a new list - even though list is mutable no? then why create a new one?
and creating some other pointers, not sure why...

then we unpack the rest of the arguments as the comment suggests, if it doesn't match the arguments there (being the function 'sorted' for example) then we break out..

I am pretty sure I am reading this all completely wrong, so I stopped here...

Thanks for the help in advanced, sorry for the multiple questions but this block of code is blowing my mind and learning to read this would help me a lot!

CodePudding user response:

The actual sorting is done by list.sort. sorted simply creates a new list from whatever iterable argument it is given, sorts that list in-place, then returns it. A pure Python implementation of sorted might look like

def sorted(itr, *, key=None):
    newlist = list(itr)
    newlist.sort(key=key)
    return newlist

Most of the C code is just boilerplate for working with the underlying C data structures, detecting and propagating errors, and doing memory management.

The actual sorting algorithm is spread throughout Objects/listobject.c; start here. If you are really interested in what the algorithm is, rather than how it is implemented in C, you may want to start with https://github.com/python/cpython/blob/main/Objects/listsort.txt instead.

CodePudding user response:

list sort implementation isn't there. This is a wrapper function fetching PyId_sort from there:

callable = _PyObject_GetAttrId(newlist, &PyId_sort);

object.h contains a macro using token pasting to define the PyId_xxx objects

#define _Py_IDENTIFIER(varname) _Py_static_string(PyId_##varname, #varname)

... and I stopped digging after that. There could be more macro magic involved in order to enforce a coherent naming through the whole python codebase.

The implementation is located here:

https://github.com/python/cpython/blob/main/Objects/listobject.c

More precisely around line 2240

static PyObject *
list_sort_impl(PyListObject *self, PyObject *keyfunc, int reverse)
/*[clinic end generated code: output=57b9f9c5e23fbe42 input=cb56cd179a713060]*/
{

Comments read:

/* An adaptive, stable, natural mergesort.  See listsort.txt.
 * Returns Py_None on success, NULL on error.  Even in case of error, the
 * list will be some permutation of its input state (nothing is lost or
 * duplicated).
 */

Now it takes some effort to understand the details of the algorithm but it's there.

  • Related