I have a CSVDataset
which has around 6 million rows. For the purposes of this question I am making a TensorSliceDataset as following:-
import tensorflow as tf
import numpy as np
datasetz = tf.data.Dataset.from_tensor_slices((np.random.randn(10, 5), np.random.randn(10,1)))
datasetz = datasetz.map(lambda x, y: (x, x))
datasetz
# <MapDataset element_spec=(TensorSpec(shape=(5,), dtype=tf.float64, name=None), TensorSpec(shape=(5,), dtype=tf.float64, name=None))>
I am trying to make a denoising autoencoder. For this, I need to add some noise to my dataset. If dataset
were a numpy.ndarray
, I could've added the noise the following way:-
corruption_level = 0.3
datasetz = datasetz (np.random.randn(10, 5) * corruption_level)
But I don't know how to do it with a CSVDataset
object.
CodePudding user response:
This adds each row with random noise:
datasetz = tf.data.Dataset.from_tensor_slices((np.random.randn(10, 5), np.random.randn(10,1)))
datasetz = datasetz.map(lambda x, y: (x corruption_level*tf.random.uniform(shape=(5,), dtype=tf.float64), y))
datasetz