`random.seed` doesn't work with generators-CodePudding

I have a function that generates a sequence of random integers using random.randint. I also include a seed parameter for reproducibility.

I noticed an odd behavior when I returned a generator object. Different calls with the same seed did not return the same result. However, when I returned a tuple instead of a generator, the results from different calls were the same.

Returning a generator.

In [2]: import random

In [3]: def rand_sequence(n, seed=None):
   ...:     if seed is not None:
   ...:         random.seed(seed)
   ...:     return (random.randint(0, n) for _ in range(n))
   ...:

In [4]: first = rand_sequence(10, seed=0)

In [5]: second = rand_sequence(10, seed=0)

In [6]: assert tuple(first) == tuple(second)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Input In [6], in <cell line: 1>()
----> 1 assert tuple(first) == tuple(second)

AssertionError:

Returning a tuple

In [7]: def rand_sequence(n, seed=None):
   ...:     if seed is not None:
   ...:         random.seed(seed)
   ...:     return tuple(random.randint(0, n) for _ in range(n))
   ...:

In [8]: first = rand_sequence(10, seed=0)

In [9]: second = rand_sequence(10, seed=0)

In [10]: assert first == second

In [11]:

I guess it might have something to do with the control-flow mechanism of generators, but can't figure out how it plays out.

Edit:

What's even more confusing is when I used yield directly then the comparison checked out.

In [11]: def rand_sequence(n, seed=None):
    ...:     if seed is not None:
    ...:         random.seed(seed)
    ...:
    ...:     for _ in range(n):
    ...:         yield random.randint(0, n)
    ...:
    ...:

In [12]: first = rand_sequence(10, seed=0)

In [13]: second = rand_sequence(10, seed=0)

In [14]: assert tuple(first) == tuple(second)

CodePudding user response：

The generator expression is evaluated later, lazily, and it has side effects on the PRNG.

You will find it much easier to reason about your algorithm if you first store PRNG outputs in a container, such as a tuple or list.

Specifically, if your code produced

first = [f0, f1, f2, ..., f9]
second = [s0, s1, s2, ..., s9]

you are grappling with whether comparing equality of

f0 == s0
f1 == s1
...
f9 == s9

is "the same as" equality of

[f0, f1, ..., f9] == [s0, s1, ..., s9]

Perturbing the PRNG state in the middle of such equality tests is going to cause you grief, in the manner you have noted. Simplify, store values in containers, and you'll be happier. Certainly the debugging situation will improve.

EDIT

I wouldn't go so far as @Carcigenicate's remark about

... and generators only evaluate as equal if they're literally the same generator.

We can certainly evaluate whether each successive value generated by one matches what's generated by the other. The core difficulty OP encountered is the global PRNG state is shared across generators, so order-of-evaluation matters.

>>> r1 = range(7)
>>> r2 = range(7)
>>> id(r1)
4396054832
>>> id(r2)
4396054880
>>> r1 == r2
True