If I have a set of tfrecords, using .from_tensor_slices() here, will dataset
created preserve the order of the data? For example, if I have 3 tfrecords (the first one contains 40 examples, the second one contains 30 examples, the third one contains 70 examples) called 1.tfrecord
, 2.tfrecord
, 3.tfrecord
respectively, then I construct dataset = tf.data.Dataset.from_tensor_slices(['1.tfrecord', '2.tfrecord', '3.tfrecord'])
. During loading, will the order of these examples preserved?
CodePudding user response:
If I understood your question correctly, yes, the order of examples is preserved when using tf.data.Dataset.from_tensor_slices
with tfrecord
. Here is a simple example:
import tensorflow as tf
with tf.io.TFRecordWriter("sample1.tfrecord") as w:
w.write(b"Record A")
w.write(b"Record B")
with tf.io.TFRecordWriter("sample2.tfrecord") as w:
w.write(b"Record C")
w.write(b"Record D")
w.write(b"Record E")
w.write(b"Record F")
with tf.io.TFRecordWriter("sample3.tfrecord") as w:
w.write(b"Record G")
w.write(b"Record H")
w.write(b"Record I")
w.write(b"Record J")
w.write(b"Record K")
w.write(b"Record L")
dataset = tf.data.Dataset.from_tensor_slices(["sample1.tfrecord",
"sample2.tfrecord",
"sample3.tfrecord"])
for record in dataset:
for item in tf.data.TFRecordDataset(record):
tf.print('Record:', record, 'Item -->', item)
Record: "sample1.tfrecord" Item --> "Record A"
Record: "sample1.tfrecord" Item --> "Record B"
Record: "sample2.tfrecord" Item --> "Record C"
Record: "sample2.tfrecord" Item --> "Record D"
Record: "sample2.tfrecord" Item --> "Record E"
Record: "sample2.tfrecord" Item --> "Record F"
Record: "sample3.tfrecord" Item --> "Record G"
Record: "sample3.tfrecord" Item --> "Record H"
Record: "sample3.tfrecord" Item --> "Record I"
Record: "sample3.tfrecord" Item --> "Record J"
Record: "sample3.tfrecord" Item --> "Record K"
Record: "sample3.tfrecord" Item --> "Record L"
Or:
dataset = tf.data.Dataset.from_tensor_slices(["sample1.tfrecord",
"sample2.tfrecord",
"sample3.tfrecord"])
for item in tf.data.TFRecordDataset(dataset):
tf.print('Item -->', item)
Item --> "Record A"
Item --> "Record B"
Item --> "Record C"
Item --> "Record D"
Item --> "Record E"
Item --> "Record F"
Item --> "Record G"
Item --> "Record H"
Item --> "Record I"
Item --> "Record J"
Item --> "Record K"
Item --> "Record L"