Home > Blockchain >  Set random labels for images in tf.data.Dataset
Set random labels for images in tf.data.Dataset

Time:07-19

I have a tf data dataset of images with a signature as seen below :

<_UnbatchDataset element_spec=(TensorSpec(shape=(None, 128, 128, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))>

All the labels in this dataset are 0. What I would like to do is change each of these labels to a random number from 0 to 3.
My code is :

def change_label(image, label):
   return image, np.random.randint(0, 4)

dataset = dataset.map(change_label)

This however just assigns 1 to all images as a label. The strange this is that no matter how many times i run it it still assigns 1 to these images.
Any ideas?

CodePudding user response:

The problem is that using dataset.map runs all operations in graph mode and random numbers generated by numpy are not tracked by tensorflow and are therefore deterministic. Random tensorflow tensors, on the other hand, will be tracked. So try something like this:

import tensorflow as tf

images = tf.random.normal((50, 128, 128, 3))
dataset = tf.data.Dataset.from_tensor_slices((images))

dataset = dataset.map(lambda x: (x, tf.random.uniform((), maxval=4, dtype=tf.int32))).batch(2)

for x, y in dataset.take(1):
  print(x.shape, y)
(2, 128, 128, 3) tf.Tensor([2 2], shape=(2,), dtype=int32)

CodePudding user response:

You need to use tf.experimental.numpy.random.randint.

import tensorflow as tf
def change_label(image, label):
    return image, tf.experimental.numpy.random.randint(0,4)

dataset = dataset.map(change_label)

for img,lbl in dataset.take(10):
    print(lbl)
# tf.Tensor(1, shape=(), dtype=int64)
# tf.Tensor(0, shape=(), dtype=int64)
# tf.Tensor(2, shape=(), dtype=int64)
# tf.Tensor(2, shape=(), dtype=int64)
# tf.Tensor(1, shape=(), dtype=int64)
# tf.Tensor(3, shape=(), dtype=int64)
# tf.Tensor(0, shape=(), dtype=int64)
# tf.Tensor(3, shape=(), dtype=int64)
# tf.Tensor(2, shape=(), dtype=int64)
# tf.Tensor(3, shape=(), dtype=int64)

Generate random dataset for using: (At first, I set all labels zero like your question.)

import numpy as np
x = np.random.rand(100, 128, 128, 3)
y = np.random.randint(0,1, size=100)

dataset = tf.data.Dataset.from_tensor_slices((x,y))

for img,lbl in dataset.take(10):
    print(lbl)
# tf.Tensor(0, shape=(), dtype=int64)
# tf.Tensor(0, shape=(), dtype=int64)
# tf.Tensor(0, shape=(), dtype=int64)
# tf.Tensor(0, shape=(), dtype=int64)
# tf.Tensor(0, shape=(), dtype=int64)
# tf.Tensor(0, shape=(), dtype=int64)
# tf.Tensor(0, shape=(), dtype=int64)
# tf.Tensor(0, shape=(), dtype=int64)
# tf.Tensor(0, shape=(), dtype=int64)
# tf.Tensor(0, shape=(), dtype=int64)

CodePudding user response:

I'd say just iterate over the dataset in a for loop:

def change_labels(dataset):
for i in range(len(dataset)):
    dataset[i][1] = random.choice([1, 2, 3])  # i would guess that dataset has image on index 0 and label on index 1
return dataset

dataset = change_labels(dataset)

  • Related