Home > Enterprise >  How do I create a tf.Tensor from a pandas DataFrame containing arrays?
How do I create a tf.Tensor from a pandas DataFrame containing arrays?

Time:02-14

I have a pandas DataFrame like below.

import pandas as pd
import numpy as np
import tensorflow as tf  # Version 2.8.0
df = pd.DataFrame({"id": 
                   ["i123", "i456"],  
                   "col": [np.array(["igh", "ghdd", "yu"]),
                           np.array(["uh", "lkk", "nj"])]})
print(df)

Output:

    id      col
0   i123    [igh, ghdd, yu]
1   i456    [uh, lkk, nj]

I would to create a Tensor from the values of the col column, in order to use them in a specific use case. I have tried converting the values like

values = df["col"].to_numpy()
values

Which looks like:

array([array(['igh', 'ghdd', 'yu'], dtype='<U4'),
       array(['uh', 'lkk', 'nj'], dtype='<U3')], dtype=object)

When I try to convert this to a Tensor, by

tf.constant(values)

I get an exception:

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).

I can see from the TF docs

The values variable I create have .shape like (2,) while the image below have (2, 3), which might be the problem. I can't seem to get the dtype and/or shape to match exactly, and I'm unsure how to get it to work. Any ideas?

CodePudding user response:

Try:

import pandas as pd
import numpy as np
import tensorflow as tf  # Version 2.8.0
df = pd.DataFrame({"id": 
                   ["i123", "i456"],  
                   "col": [np.array(["igh", "ghdd", "yu"]),
                           np.array(["uh", "lkk", "nj"])]})

values = df["col"].to_list()
print(tf.constant(values))
tf.Tensor(
[[b'igh' b'ghdd' b'yu']
 [b'uh' b'lkk' b'nj']], shape=(2, 3), dtype=string)

Or

values = np.stack(df["col"].to_numpy())
print(tf.constant(values))
  • Related