I have created and saved a dataset which looks like this:
# line 1
foo $ faa $ fee
# $ is the separator
I've saved like a .txt
and then saved to tf
with
from tensorflow.data import TextLineDataset
from tensorflow.data.experimental import save, load
tfsaved = TextLineDataset('path_to_file.txt')
save(tfsaved, 'path_tf_dataset')
But, when I load the dataset, it looks like this:
# Line 1
foofaafee
Can I, in any way, show to tf
that $
is my separator? If not, how can I solve this?
CodePudding user response:
Here is a simple example of how you can read your data using pandas
and pass it to tf.data.Dataset.from_tensor_slices
:
data.csv
feature1 $ feature2 $ feature3
foo $ faa $ fee
foo $ faa $ fee
foo $ faa $ fee
foo $ faa $ fee
foo $ faa $ fee
foo $ faa $ fee
foo $ faa $ fee
import pandas as pd
import tensorflow as tf
df = pd.read_csv('data.csv', sep='\ \ \ \$\ \ \ ', engine='python')
ds = tf.data.Dataset.from_tensor_slices((dict(df)))
for d in ds.take(3):
tf.print(d)
{'feature1': "foo", 'feature2': "faa", 'feature3': "fee"}
{'feature1': "foo", 'feature2': "faa", 'feature3': "fee"}
{'feature1': "foo", 'feature2': "faa", 'feature3': "fee"}
Note that I had to escape the characters
and $
, since they are special regex characters.