TensorFlow- From Object to String/Int-CodePudding

Suppose that when reading an specific df with pd.read_csv(file_path) the file is read with object dtype columns instead of string/int32 dtype columns.

This represents a problem when trying to convert a pandas df to tensorflow df:

import pandas as pd
import tensorflow as tf
import numpy as np

# convert dummy data to object to reproduce the problem
d={'A':['a', 'b', 'c', 'd'], 'B':['e', 'f', 'g', 'h'], 'number':[1, 2, 3, 4]}
df=pd.DataFrame(d).astype(object)

# converting df to tf.dataset
ds = tf.data.Dataset.from_tensor_slices(dict(df))

The next error arises:

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type int).

How to properly handle object dtype columns to string/numeric columns?

The idea is to get the next output:

ds
# console output
<TensorSliceDataset shapes: {A: (), B: (), number: ()}, types: {A: tf.string, B: tf.string, number: tf.int64}>

CodePudding user response：

Try:

df = pd.DataFrame(d).astype('category')
df['number'] = df['number'].astype(int)

ds = tf.data.Dataset.from_tensor_slices(dict(df))

<TensorSliceDataset shapes: {A: (), B: (), number: ()}, types: 
       {A: tf.string, B: tf.string, number: tf.int32}>