How to find size or shape of an tensorflow.python.data.ops.dataset_ops.MapDataset object, output of-CodePudding

Using make_csv_dataset we could read an CSV file to tensorflow dataset object

csv_data = tf.data.experimental.make_csv_dataset(
    "./train.csv",
    batch_size=8190,
    num_epochs=1,
    ignore_errors=True,)

now csv_data is of type tensorflow.python.data.ops.dataset_ops.MapDataset. How can I find the size or shape of csv_data.

print(csv_data) give column information as below

<MapDataset element_spec={'title': TensorSpec(shape=(None,), dtype=tf.string, name=None), 'user_id': TensorSpec(shape=(None,), dtype=tf.string, name=None)}>

of course getting the could from train_recom.csv using and pandas.read_csv is on option, just was curious if tensorflow has anything easier.

CodePudding user response：

If you want to get the size of your batched dataset without any preprocessing steps, try:

import pandas as pd
import tensorflow as tf

df = pd.DataFrame(data={'A': [50.1, 1.23, 4.5, 4.3, 3.2], 'B':[50.1, 1.23, 4.5, 4.3, 3.2], 'C':[5.2, 3.1, 2.2, 1., 3.]})

df.to_csv('data1.csv', index=False)
df.to_csv('data2.csv', index=False)

dataset = tf.data.experimental.make_csv_dataset(
    "/content/*.csv",
    batch_size=2,
    field_delim=",",
    num_epochs=1,
    select_columns=['A', 'B', 'C'],
    label_name='C')

dataset_len = len(list(dataset.map(lambda x, y: (x, y))))
print(dataset_len)

If you want to know how many samples you have altogether, try unbatch:

dataset_len = len(list(dataset.unbatch().map(lambda x, y: (x, y))))
print(dataset_len)
# 10