Home > Blockchain >  Saving and opening a tensorflow dataset
Saving and opening a tensorflow dataset

Time:11-17

I have created and saved a dataset which looks like this:

# line 1
foo   $   faa   $   fee
#    $    is the separator

I've saved like a .txt and then saved to tf with

from tensorflow.data import TextLineDataset
from tensorflow.data.experimental import save, load
tfsaved = TextLineDataset('path_to_file.txt')
save(tfsaved, 'path_tf_dataset')

But, when I load the dataset, it looks like this:

# Line 1
foofaafee

Can I, in any way, show to tf that $ is my separator? If not, how can I solve this?

CodePudding user response:

Here is a simple example of how you can read your data using pandas and pass it to tf.data.Dataset.from_tensor_slices:

data.csv

feature1   $   feature2   $   feature3
foo   $   faa   $   fee
foo   $   faa   $   fee
foo   $   faa   $   fee
foo   $   faa   $   fee
foo   $   faa   $   fee
foo   $   faa   $   fee
foo   $   faa   $   fee
import pandas as pd 
import tensorflow as tf

df =  pd.read_csv('data.csv', sep='\ \ \ \$\ \ \ ', engine='python')
ds = tf.data.Dataset.from_tensor_slices((dict(df)))

for d in ds.take(3):
  tf.print(d)
{'feature1': "foo", 'feature2': "faa", 'feature3': "fee"}
{'feature1': "foo", 'feature2': "faa", 'feature3': "fee"}
{'feature1': "foo", 'feature2': "faa", 'feature3': "fee"}

Note that I had to escape the characters and $, since they are special regex characters.

  • Related