I'm new to TensorFlow, and I was trying to understand how seed generation works (mainly with functions), my confusion is if I run this bit of code in my main program :

a = tf.random.uniform([1], seed=1)
b = tf.random.uniform([1], seed=1)
print(a)
print(b)

I get this output

tf.Tensor([0.2390374], shape=(1,), dtype=float32)
tf.Tensor([0.22267115], shape=(1,), dtype=float32)

two different tensors (obviously) but if I try to call it from a function :

@tf.function
def foo():
  a = tf.random.uniform([1], seed=1)
  b = tf.random.uniform([1], seed=1)
  print(a)
  print(b)
  return a, b

foo()

I get this:
Tensor("random_uniform/RandomUniform:0", shape=(1,), dtype=float32)
Tensor("random_uniform_1/RandomUniform:0", shape=(1,), dtype=float32)
(<tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.2390374], dtype=float32)>,
<tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.2390374], dtype=float32)>)

There's two things I don't get:

Why did the two prints inside the function show different output to the print from the return value of the function, in other words why was no value (numpy=array...) shown in the first two lines compared to the last two?
Why are a and b equal?

CodePudding user response：

Let's answer your first question

Why did the two prints inside the function show different output to the print from the return value of the function, in other words why was no value (numpy=array...) shown in the first two lines compared to the last two?

Eager Execution vs Graph Execution

At its core, Tensorflow works with tensor operations (ops). The expression tf.random.uniform([1], seed=1), for example, defines an op that, when executed, will generate a single random number with a seed of 1.

Now, you will need to actually execute this op to get a concrete random number. There are two ways to do that

Execute this op right as soon as it is defined. This is called Eager Execution (EE), and is arguably the most intuitive and flexible way to execute an op. It is also the default way to execute ops in TF 2.x
Compose this op and possibly other ops together into a static, optimized computational graph and use that optimized graph for every execution. This is called Graph Execution (GE). This was the default way to execute ops in TF 1.x, while in TF 2.x it is optional and can be enabled using the @tf.function decorator.

In EE context, the op tf.random.uniform([1], seed=1) is executed as soon as it is defined and immediately returns a concrete value - an EagerTensor object.

a = tf.random.uniform([1], seed=1)
print(type(a))
[Out]:
tensorflow.python.framework.ops.EagerTensor

While in GE, the op is simply a symbolic reference to a node in the optimized computational graph (a Tensor object to be exact). Whenever you invoke the function, the op is actually executed, at which time it produces a concrete value - an EagerTensor.

@tf.function
def genRandom():
    a = tf.random.uniform([1], seed=1)
    print(f'a type is {type(a)}')
    return a

print(f'output type is {type(genRandom())}')
[Out]:
a type is <class 'tensorflow.python.framework.ops.Tensor'>
output type is <class 'tensorflow.python.framework.ops.EagerTensor'>

GE enables various static-graph optimizations to be performed to improve execution performance. EE, however, is far more intuitive and easier to debug as ops are executed on-the-fly instead of converted into cryptic graph nodes. A good practice is to debug functions in EE and wrap them in @tf.function in final production to leverage speed advantages of GE. Be careful though, as some ops are allowed in EE context but not allowed or not available in GE context.

Graph Execution: Tracing

Going back to this example

@tf.function
def genRandom():
    a = tf.random.uniform([1], seed=1)
    print(f'a is {a}')
    return a

_ = genRandom()
[Out]:
a is Tensor("random_uniform/RandomUniform:0", shape=(1,), dtype=float32)

If in GE the op is only executed when the function is called, at which points it produces a concrete value, then why does Python print inside the function show a graph node ref instead of the concrete value that a produces?

This is because Python print is not a TF op, nor is it convertible to one. Therefore, it won't be included in the computational graph and won't be executed. However, you did get a printout when running the function so what's happening here?

Turns out that TF constructs graph lazily. That is, only when a function decorated with @tf.function is called for the first time does TF start constructing its graph. This is done by executing the function as normal Python code once to "record" the ops within the function - a process known as tracing - from which an optimized graph is constructed. After that, it is the constructed graph (and not the python code) who is called to produce the output.

In other words, Python's print - which is not an op - will work for the first call thanks to the tracing procedure. However, once tracing is done and the graph is constructed, it is not included in the execution graph and won't be executed for subsequent calls:

@tf.function
def genRandom():
    a = tf.random.uniform([1], seed=1)
    print(f'a is {a}')  # not an op, will not be included in graph and only print during tracing
    return a

_ = genRandom()
[Out]:
a is Tensor("random_uniform/RandomUniform:0", shape=(1,), dtype=float32)

_ = genRandom()
[Out]:

To get the actual value when the graph is executed, you'll need to use tf.print() which is a valid TF op that'll be included in the graph.

@tf.function
def genRandom():
    a = tf.random.uniform([1], seed=1)
    tf.print("a is:", a)  # an op, will be included in graph and always print
    return a

_ = genRandom()
[Out]:
a is: [0.239037395]

_ = genRandom()
[Out]:
a is: [0.222671151]

As a side note, if you call an already-graphed function with arguments of types different than ones the function saw when it's last traced, retracing may happen (because the current graph may not be usable for this new type), in which case the python code inside the function is executed again. Retracing is very expensive and should be avoided.

For the second question

Why are a and b equal in GE but not EE?

tf.random seeds

Now you know what GE and EE is, go through this doc. It explains everything you wish to know about what the seed argument does and how ops internal representation differs in GE and EE mode. In short, the random number an tf.random.uniform() op creates completely depends on three factors

The global seed (which is set via tf.random.set_seed)
The op's own seed (which is passed to it constructor as the seed argument)
The op's internal counter, says _count, which counts the number of times the op has been executed. _count is reset to 0 everytime tf.random.set_seed is called

In EE mode, identical ops share the same internal representation and thus your a and b share the same _count. You get two different numbers because the value of _count involved in generating them actually differs by 1.

In GE mode, each op has its own internal representation and thus a and b have their own _count, which explains why you get the same number: everything involved in generating the two numbers are identical.