Suppose that when reading an specific df with pd.read_csv(file_path)
the file is read with object dtype columns instead of string/int32 dtype columns.
This represents a problem when trying to convert a pandas df to tensorflow df:
import pandas as pd
import tensorflow as tf
import numpy as np
# convert dummy data to object to reproduce the problem
d={'A':['a', 'b', 'c', 'd'], 'B':['e', 'f', 'g', 'h'], 'number':[1, 2, 3, 4]}
df=pd.DataFrame(d).astype(object)
# converting df to tf.dataset
ds = tf.data.Dataset.from_tensor_slices(dict(df))
The next error arises:
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type int).
How to properly handle object dtype columns to string/numeric columns?
The idea is to get the next output:
ds
# console output
<TensorSliceDataset shapes: {A: (), B: (), number: ()}, types: {A: tf.string, B: tf.string, number: tf.int64}>
CodePudding user response:
Try:
df = pd.DataFrame(d).astype('category')
df['number'] = df['number'].astype(int)
ds = tf.data.Dataset.from_tensor_slices(dict(df))
<TensorSliceDataset shapes: {A: (), B: (), number: ()}, types:
{A: tf.string, B: tf.string, number: tf.int32}>