Strange speeds of `x in range(...)` checks-CodePudding

With r = range(1500, 2500), I benchmarked x in r for x below/in/above the range:

1000 in r :   58 ns ± 0 ns
2000 in r :  101 ns ± 1 ns
3000 in r :   58 ns ± 0 ns

Checking 1000 is faster than checking 2000? Makes sense, as for 1000, Python knows the result after only checking the range's lower bound, whereas for 2000 it needs to check both bounds.

Checking 3000 is faster than checking 2000? Makes sense, as for 3000, Python knows the result after only checking the range's upper bound, whereas for 2000 it needs to check both bounds.

Hey... waaaiiit a minute...

How does it know which bound to check first? How can 1000 and 3000 both be checked faster than 2000?

Benchmark code (Try it online!):

from timeit import repeat
from statistics import mean, stdev

setup = 'r = range(1500, 2500)'

n = 10**4
for _ in range(3):
    for x in 1000, 2000, 3000:
        code = f'{x} in r'
        ts = repeat(code, setup, number=n, repeat=100)
        ts = [t/n * 1e9 for t in sorted(ts)[:10]]
        print(code, ':  = ns ± %d ns' % (mean(ts), stdev(ts)))
    print()

CodePudding user response：

In CPython, the implementation first checks the value against both endpoints:

if (cmp1 == 1) { /* positive steps: start <= ob < stop */
    cmp2 = PyObject_RichCompareBool(r->start, ob, Py_LE);
    cmp3 = PyObject_RichCompareBool(ob, r->stop, Py_LT);
}

If either is false, it can exit immediately:

if (cmp2 == 0 || cmp3 == 0) { /* ob outside of range */
    result = 0;
    goto end;
}

If both checks are true, though, it then has to check if the stride rejects the value. (For example, 2000 in range(1501, 2500) is true, but 2000 in range(1501, 2500, 2) is false.)

tmp1 = PyNumber_Subtract(ob, r->start);
if (tmp1 == NULL)
    goto end;
tmp2 = PyNumber_Remainder(tmp1, r->step);
if (tmp2 == NULL)
    goto end;
/* result = ((int(ob) - start) % step) == 0 */
result = PyObject_RichCompareBool(tmp2, zero, Py_EQ);

It's this last step that makes 2000 in range(1500, 2500) slower than either out-of-bound check.

(See https://github.com/python/cpython/blob/main/Objects/rangeobject.c#L381 for the full implementation.)

CodePudding user response：

If we look at the CPython implementation, we can see that the range_contains_long function does indeed contain a quick bounds check, specialized for both positive and negative steps.

This explains why the outer cases are both much faster, but of course, it's specific to a single implementation and could change at any time.