I have a dataframe that one of it's columns is like this:
values
-------------
| [0, 2] |
| [0] |
| [5, 1, 9] |
| . |
| . |
| . |
------------
The daya type for this column is object
now. How can I convert this column into a tensorflow dataset?
CodePudding user response:
You can use tf.data.Dataset.from_tensor_slices
for creating dataset from array
but if your array has a different shape you get an error like the below:
>>> import tensorflow as tf
>>> tf.data.Dataset.from_tensor_slices([[1, 2], [3]])
...
... ValueError: Can't convert non-rectangular Python sequence to Tensor.
For this reason, I first fill different shapes with np.nan then creatinh an dataset like the below: (I fill a different shape with np.nan
. you can fill with any value as you like.)
import numpy as np
import pandas as pd
import tensorflow as tf
df = pd.DataFrame({'values':[[0,2],[0], [5,1,9], [2,3], [1,2,3]]})
num_col = df['values'].apply(lambda x: len(x)).max()
num_row = len(df['values'])
arr = np.full([num_row, num_col], np.nan)
for idx, row in enumerate(df['values']):
arr[idx, 0:len(row)] = row
dataset = tf.data.Dataset.from_tensor_slices(arr)
for item in dataset.take(5):
print(item)
Output:
tf.Tensor([ 0. 2. nan], shape=(3,), dtype=float64)
tf.Tensor([ 0. nan nan], shape=(3,), dtype=float64)
tf.Tensor([5. 1. 9.], shape=(3,), dtype=float64)
tf.Tensor([ 2. 3. nan], shape=(3,), dtype=float64)
tf.Tensor([1. 2. 3.], shape=(3,), dtype=float64)
CodePudding user response:
Just try using a ragged structure:
import tensorflow as tf
import pandas as pd
df = pd.DataFrame(data={'values':[[0, 2], [0], [5, 1, 9]]})
ds = tf.data.Dataset.from_tensor_slices((tf.ragged.constant(df['values'])))
for d in ds:
print(d)
tf.Tensor([0 2], shape=(2,), dtype=int32)
tf.Tensor([0], shape=(1,), dtype=int32)
tf.Tensor([5 1 9], shape=(3,), dtype=int32)
And if you want each tensor to be the same length:
ds = tf.data.Dataset.from_tensor_slices((tf.ragged.constant(df['values']).to_tensor()))
for d in ds:
print(d)
tf.Tensor([0 2 0], shape=(3,), dtype=int32)
tf.Tensor([0 0 0], shape=(3,), dtype=int32)
tf.Tensor([5 1 9], shape=(3,), dtype=int32)