Home > other >  difference between tf.matmul and torch.matmul when tolerance >= 1e-5
difference between tf.matmul and torch.matmul when tolerance >= 1e-5

Time:01-03

gat_key_t = np.random.normal(size = (8, 16, 64, 20)).astype(np.float32)
gat_query_t = np.random.normal(size = (8, 16, 30, 64)).astype(np.float32)

tf_key   = tf.convert_to_tensor(gat_key_t)
tf_query = tf.convert_to_tensor(gat_query_t)
pt_key   = torch.from_numpy(gat_key_t)
pt_query = torch.from_numpy(gat_query_t)

tf_output = tf.matmul(tf_query, tf_key)
pt_output = torch.matmul(pt_query, pt_key)

# False
np.allclose(tf_output.numpy(), pt_output.numpy(), rtol = 1e-5, atol = 1e-5, equal_nan = False)


# True
np.allclose(tf_output.numpy(), pt_output.numpy(), rtol = 1e-4, atol = 1e-4, equal_nan = False)

When I multiply two tensors, the outputs of torch and tensorflow are different when tolerance is smaller than 1e-5.

As above, two values are the same until 1e-4, but they become different as tolerance becomes smaller.

How can I make two output be the same in the tolerance of 1e-5?

CodePudding user response:

Had encountered this issue recently when trying to port a transformer model from pytorch to TF. Only their CPU version of TF seems to be closer to both pytorch matmul and numpy's matmul. Casting the params to tf.float64 also improves the precision. Their GPU implementation of matmul (which uses cublas) seems to suffer from precision issues.

The closer i got to improve the precision is by native way of implementing matmul:

tf_output = tf.reduce_sum(tf.expand_dims(tf.transpose(tf_query, (0, 1, 3, 2)), -1)*tf.expand_dims(tf_key,3), 2)
  • Related