Creating a tf.Dataset from an numpy array with shape (890,2048,3)-CodePudding

I am working on the point net implementation for the registration of point clouds. for that I created 890 source and target point clouds stored in NumPy arrays with shape=(2048,3). I then combined all 890 source and target arrays into 2 big arrays with shape=(890,2048,3). Now I want to create an input pipeline for a TensorFlow model. How do I create a Tensorflow dataset from these two numpy arrays and how do I check whether it worked? I tried :

data1 = tf.data.Dataset.from_tensor_slices((source,targ))
data

But I only get:

<TensorSliceDataset element_spec=(TensorSpec(shape=(2048, 3), dtype=tf.float64, name=None), TensorSpec(shape=(2048, 3), dtype=tf.float64, name=None))>'

as an output..

I really appreciate any help or guidance to where to look at:)

CodePudding user response：

It seems like it worked, what you can do is you can check the original data using indexing and your data1 object and check if they hold the same value

CodePudding user response：

This is because you need to batch your data. Otherwise tensorflow retains the original shape with which you created the dataset and sends in batches of 1

Contrast

source = np.random.normal(size=(890,2048,3))
targ = np.random.normal(size=(890,2048,3))

data1 = tf.data.Dataset.from_tensor_slices((source,targ))

for x,y in data1.take(1):
  print(x.shape)
  print(y.shape)

>>>(2048, 3)
(2048, 3)

with

source = np.random.normal(size=(890,2048,3))
targ = np.random.normal(size=(890,2048,3))

data1 = tf.data.Dataset.from_tensor_slices((source,targ))
data1 = data1.batch(8) #Or some number of convenience

for x,y in data1.take(1):
  print(x.shape)
  print(y.shape)

>>>(8, 2048, 3)
(8, 2048, 3)