Home > Enterprise >  How lists are compared in python under the hood
How lists are compared in python under the hood

Time:06-04

Is each element of first list being compared to one with the same index of the second list or is there another more tricky way?

CodePudding user response:

From the docs, Common Sequence Operations:

tuples and lists are compared lexicographically by comparing corresponding elements. This means that to compare equal, every element must compare equal and the two sequences must be of the same type and have the same length. (For full details see Comparisons in the language reference.)

CodePudding user response:

The CPython3 implementation of list comparison is here on line 2752 as of the time I wrote this answer.

You can also take a look at Python/C API Reference Manual.

I implemented a simple class that prints a message when there are calls to some of the magic methods for comparisons (e.g. __eq__ is the method that's called when you are using the == operator). The class simply holds a value that you pass in the constructor. When you compare Value objects, their values are compared. Hopefully, this gives you a better understanding of what's going on.

def show_call(func):
    def wrapper(*args, **kwargs):
        if kwargs:
            raise NotImplemented()
        print(f"{func.__qualname__}({', '.join(map(str, args))})")
        return func(*args)
    return wrapper


class Value:
    
    def __init__(self, value):
        self.value = value
    
    def __repr__(self):
        return f"Value({self.value})"
    
    @show_call
    def __eq__(self, other):
        return self.value == other.value
    
    @show_call
    def __lt__(self, other): 
        return self.value < other.value
    
    @show_call
    def __gt__(self, other): 
        return self.value > other.value
    
    @show_call
    def __le__(self, other): 
        return self.value <= other.value
    
    @show_call
    def __ge__(self, other): 
        return self.value >= other.value

Let's first look at what happened when you compare the equality of two lists of different lengths. We run the following code:

ls1 = [Value(1), Value(2)]

ls2 = [Value(1), Value(2), Value(3)]
print(f"{ls1} == {ls2}")
ls1 == ls2;

This gives the following output:

[Value(1), Value(2)] == [Value(1), Value(2), Value(3)]

We can see there's no call to the __eq__ method. This makes sense because two lists with different lengths are considered not equal. There's no need to compare any items in the lists. In CPython, this is implemented as

if (Py_SIZE(vl) != Py_SIZE(wl) && (op == Py_EQ || op == Py_NE)) {
    /* Shortcut: if the lengths differ, the lists differ */
    if (op == Py_EQ)
        Py_RETURN_FALSE;
    else
        Py_RETURN_TRUE;
}

And then CPython source code says

Search for the first index where items are different

We run equality tests on lists of the same length:

ls3 = [Value(1), Value(2)]
print(f"{ls1} == {ls3}")
ls1 == ls3;
ls4 = [Value(2), Value(2)]
print(f"{ls1} == {ls2}")
ls1 == ls4;

The output is as follows:

[Value(1), Value(2)] == [Value(1), Value(2)]
Value.__eq__(Value(1), Value(1))
Value.__eq__(Value(2), Value(2))
[Value(1), Value(2)] == [Value(2), Value(2)]
Value.__eq__(Value(1), Value(2))

We can see indeed the items at index 0 are compared first, then the subsequent items. It stops when:

  1. it encounters the first index where the items are not equal in the two lists; or
  2. either list is exhausted.

If either list is exhausted, we can return the comparison between the length of the two lists.

At this stage, we can already determine the equality or inequality (!=) for the lists, because we already found the first element that differs.

The process is all the same for less than, greater than, less than or equal to, and greater than or equal to (lt, gt, le, and ge) tests. The index of the first different item is made out. Their lengths are compared if either list is exhausted. The only difference is that it makes one last comparison with the appropriate magic method.

The following code

ls5 = [Value(1), Value(3)]
print(f"{ls1} > {ls5}")
ls1 > ls5;

Outputs:

[Value(1), Value(2)] > [Value(1), Value(3)]
Value.__eq__(Value(1), Value(1))
Value.__eq__(Value(2), Value(3))
Value.__gt__(Value(2), Value(3))

For lt, gt, le, and ge, one last call to the appropriate comparison function is made.

Also, as mentioned in some of the comments, the identity test is used first for list item equality tests. As you can see in the following example,

ls6 = [ls1[0], Value(2)]
print(f"{ls1} == {ls6}")
ls1 == ls6;

outputs:

// ls1 ids [140401841947936, 140401277199120]
// ls6 ids [140401841947936, 140401005879200]
[Value(1), Value(2)] == [Value(1), Value(2)]
Value.__eq__(Value(2), Value(2))

The first item in both lists are identical (i.e. they reference the same object in C, and have the same id in Python). So, there's no need to call the __eq__ function to test for equality. From the source code comment,

Quick result when objects are the same. Guarantees that identity implies equality.

Some of the things are already covered here, I didn't realize until I finished writing my answer.

  • Related