I have a NumPy array of shape 500,36,24,72. Now I want to create a data pipeline for a problem using tf.data
. For every iteration, only a subset of the array is required, for example, first the model is trained over [500,x:y,24,72], wherein only a subset of the second dimension is taken.
ds1 = tf.data.Dataset.zip((tf.data.Dataset.from_tensor_slices(data))
Applying a filter over the above dataset doesn't seem to work
ds2 = ds1.filter(lambda x: x[1:3][:][:])
CodePudding user response:
Use tf.data.Dataset.map
:
import numpy as np
import tensorflow as tf
data = np.random.random((500,36,24,72))
ds1 = tf.data.Dataset.zip((tf.data.Dataset.from_tensor_slices(data)))
ds2 = ds1.map(lambda x: x[1:3, ...])