Home > database >  How to convert a list of integers into tensorflow dataset
How to convert a list of integers into tensorflow dataset

Time:06-10

I have a dataframe that one of it's columns is like this:

values
-------------
| [0, 2]    |
| [0]       |
| [5, 1, 9] |
|    .      |
|    .      |
|    .      |
------------

The daya type for this column is object now. How can I convert this column into a tensorflow dataset?

CodePudding user response:

You can use tf.data.Dataset.from_tensor_slices for creating dataset from array but if your array has a different shape you get an error like the below:

>>> import tensorflow as tf
>>> tf.data.Dataset.from_tensor_slices([[1, 2], [3]])
...
... ValueError: Can't convert non-rectangular Python sequence to Tensor.

For this reason, I first fill different shapes with np.nan then creatinh an dataset like the below: (I fill a different shape with np.nan. you can fill with any value as you like.)

import numpy as np
import pandas as pd
import tensorflow as tf

df = pd.DataFrame({'values':[[0,2],[0], [5,1,9], [2,3], [1,2,3]]})
num_col = df['values'].apply(lambda x: len(x)).max()
num_row = len(df['values'])
arr = np.full([num_row, num_col], np.nan)
for idx, row in enumerate(df['values']):
    arr[idx, 0:len(row)] = row

dataset = tf.data.Dataset.from_tensor_slices(arr)
for item in dataset.take(5):
    print(item)

Output:

tf.Tensor([ 0.  2. nan], shape=(3,), dtype=float64)
tf.Tensor([ 0. nan nan], shape=(3,), dtype=float64)
tf.Tensor([5. 1. 9.], shape=(3,), dtype=float64)
tf.Tensor([ 2.  3. nan], shape=(3,), dtype=float64)
tf.Tensor([1. 2. 3.], shape=(3,), dtype=float64)

CodePudding user response:

Just try using a ragged structure:

import tensorflow as tf
import pandas as pd

df = pd.DataFrame(data={'values':[[0, 2], [0], [5, 1, 9]]})

ds = tf.data.Dataset.from_tensor_slices((tf.ragged.constant(df['values'])))

for d in ds:
  print(d)
tf.Tensor([0 2], shape=(2,), dtype=int32)
tf.Tensor([0], shape=(1,), dtype=int32)
tf.Tensor([5 1 9], shape=(3,), dtype=int32)

And if you want each tensor to be the same length:

ds = tf.data.Dataset.from_tensor_slices((tf.ragged.constant(df['values']).to_tensor()))
for d in ds:
  print(d)
tf.Tensor([0 2 0], shape=(3,), dtype=int32)
tf.Tensor([0 0 0], shape=(3,), dtype=int32)
tf.Tensor([5 1 9], shape=(3,), dtype=int32)
  • Related