Home > Software engineering >  Prepare data input for tensorflow from numpy and scipy.sparse
Prepare data input for tensorflow from numpy and scipy.sparse

Time:04-22

How to prepare data for input into a tensorflow model (say a keras Sequential one) ?

I know how to prepare x_train, y_train, x_test and y_test using numpy and scipy (eventually pandas, sklearn style) where train/test datas are train and test datas for training a neural model, and x/y stand for a 2D sparse matrix and a 1D numpy array representing integer labels of the same size as the number of raws in the x data.

I'm struggling with the Dataset documentation without many insight so far ...

So far, I could only convert the scipy.sparse matrix into a tensorflow.SparseTensor using something like

import numpy as np
import tensorflow as tf
from scipy import sparse as sp

x = sp.csr_matrix( ... )
x = tf.SparseTensor(indices=np.vstack([*x.nonzero()]).T, 
                    values=x.data, 
                    dense_shape=x.shape)

and I can convert the numpy array into a tensorflow.Tensor using something like

import numpy as np
import tensorflow as tf

y = np.array( ... ) # 1D array of len == x.shape[0]
y = tf.constant(y)
  • How to align the x and y into a single Dataset in order to construct the batch, buffers, ... and benefit from the Dataset utilities ?
  • Should I use either zip, from_tensor_slices, or any other method of the tensorflow.data.Dataset module ?

Examples of x and y are

x = tf.SparseTensor(indices=[[0, 0], [1, 2]], values=[1, 2], dense_shape=[3, 4])
y = tf.constant(np.array(range(3)))

CodePudding user response:

You should be able to use tf.data.Data.from_tensor_slices, since you mention that "y is a 1D numpy array representing integer labels of the same size as the number of rows in the x data":

import tensorflow as tf

x = tf.SparseTensor(indices=[[0, 0], [1, 2]], values=[1, 2], dense_shape=[3, 4])
y = tf.constant(np.array(range(3)))

dataset = tf.data.Dataset.from_tensor_slices((x, y))

for x, y in dataset:
  print(x, y)
SparseTensor(indices=tf.Tensor([[0]], shape=(1, 1), dtype=int64), values=tf.Tensor([1], shape=(1,), dtype=int32), dense_shape=tf.Tensor([4], shape=(1,), dtype=int64)) tf.Tensor(0, shape=(), dtype=int64)
SparseTensor(indices=tf.Tensor([[2]], shape=(1, 1), dtype=int64), values=tf.Tensor([2], shape=(1,), dtype=int32), dense_shape=tf.Tensor([4], shape=(1,), dtype=int64)) tf.Tensor(1, shape=(), dtype=int64)
SparseTensor(indices=tf.Tensor([], shape=(0, 1), dtype=int64), values=tf.Tensor([], shape=(0,), dtype=int32), dense_shape=tf.Tensor([4], shape=(1,), dtype=int64)) tf.Tensor(2, shape=(), dtype=int64)
  • Related