How to apply tf.data transformations to a DataFrame-CodePudding

I want to apply tf.data transformations to a panda dataframe. According to the tensorflow docs HERE I can apply tf.data to a dataframe directly but the dtype of the dataframe should be uniform.

When I apply tf.data to my dataframe like below

tf.data.Dataset.from_tensor_slices(df['reports'])

it generates this error

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).

When I print df['reports'].dtype it is dtype('O') which seems to be not uniformed, if this is the case then how can I convert this dataframe to uniform dtype.

CodePudding user response：

You can try forcing your df["reports"] to a specific type. Assuming that you want to convert this column to numbers you can easily do it like this:

df['reports'] = pd.to_numeric(df['reports'])

Anyway, I suggest you to investigate the cause of your non-uniform dtype('O'). You could have some mistake in your data.

CodePudding user response：

Try using a ragged structure:

import tensorflow as tf
import pandas as pd

df = pd.DataFrame(data={'reports': [[2.0, 3.0, 4.0], [2.0, 3.0], [2.0]]})

dataset = tf.data.Dataset.from_tensor_slices(tf.ragged.constant(df['reports']))

for x in dataset:
  print(x)

tf.Tensor([2. 3. 4.], shape=(3,), dtype=float32)
tf.Tensor([2. 3.], shape=(2,), dtype=float32)
tf.Tensor([2.], shape=(1,), dtype=float32)