How to prepare data for input into a tensorflow model (say a keras Sequential one) ?
I know how to prepare x_train
, y_train
, x_test
and y_test
using numpy and scipy (eventually pandas, sklearn
style) where train
/test
datas are train and test datas for training a neural model, and x
/y
stand for a 2D sparse matrix and a 1D numpy array representing integer labels of the same size as the number of raws in the x
data.
I'm struggling with the Dataset documentation without many insight so far ...
So far, I could only convert the scipy.sparse matrix into a tensorflow.SparseTensor using something like
import numpy as np
import tensorflow as tf
from scipy import sparse as sp
x = sp.csr_matrix( ... )
x = tf.SparseTensor(indices=np.vstack([*x.nonzero()]).T,
values=x.data,
dense_shape=x.shape)
and I can convert the numpy array into a tensorflow.Tensor using something like
import numpy as np
import tensorflow as tf
y = np.array( ... ) # 1D array of len == x.shape[0]
y = tf.constant(y)
- How to align the
x
andy
into a single Dataset in order to construct the batch, buffers, ... and benefit from the Dataset utilities ? - Should I use either
zip
,from_tensor_slices
, or any other method of the tensorflow.data.Dataset module ?
Examples of x
and y
are
x = tf.SparseTensor(indices=[[0, 0], [1, 2]], values=[1, 2], dense_shape=[3, 4])
y = tf.constant(np.array(range(3)))
CodePudding user response:
You should be able to use tf.data.Data.from_tensor_slices
, since you mention that "y is a 1D numpy array representing integer labels of the same size as the number of rows in the x data":
import tensorflow as tf
x = tf.SparseTensor(indices=[[0, 0], [1, 2]], values=[1, 2], dense_shape=[3, 4])
y = tf.constant(np.array(range(3)))
dataset = tf.data.Dataset.from_tensor_slices((x, y))
for x, y in dataset:
print(x, y)
SparseTensor(indices=tf.Tensor([[0]], shape=(1, 1), dtype=int64), values=tf.Tensor([1], shape=(1,), dtype=int32), dense_shape=tf.Tensor([4], shape=(1,), dtype=int64)) tf.Tensor(0, shape=(), dtype=int64)
SparseTensor(indices=tf.Tensor([[2]], shape=(1, 1), dtype=int64), values=tf.Tensor([2], shape=(1,), dtype=int32), dense_shape=tf.Tensor([4], shape=(1,), dtype=int64)) tf.Tensor(1, shape=(), dtype=int64)
SparseTensor(indices=tf.Tensor([], shape=(0, 1), dtype=int64), values=tf.Tensor([], shape=(0,), dtype=int32), dense_shape=tf.Tensor([4], shape=(1,), dtype=int64)) tf.Tensor(2, shape=(), dtype=int64)