Home > database >  train, test, validation splits in tfds.load
train, test, validation splits in tfds.load

Time:11-29

so I am asked to implement the split function parameter: 80% train, 10% validation, and 10% test. And I do not understand how to do it here. Please help. Thanks.

def plot_example(x_raw, y_raw):
  fig, axes = plt.subplots(3, 3)
  i = 0
  for i in range(3):
    for j in range(3):
      imgplot = axes[i,j].imshow(x_raw[i*3   j], cmap = 'bone')
      axes[i,j].set_title(y_raw[i*3   j])
      axes[i,j].get_yaxis().set_visible(False)
      axes[i,j].get_xaxis().set_visible(False)
  fig.set_size_inches(18.5, 10.5, forward=True)

## TODO: Implement the split function parameter: 80% train, 10% validation, and 10% test.
(ds_train, ds_val, ds_test), ds_info = tfds.load("colorectal_histology", 
                                           split=[],
                                           as_supervised=True, with_info=True)
df = tfds.as_dataframe(ds_train.shuffle(1000).take(1000), ds_info)

plot_example(df['image'], df['label'])
print(ds_info)

Please explain

CodePudding user response:

The tfds.load has the argument of split. You can use this argument to load the dataset in your desired format. If you want 80% train, 10% val, 10% test, you can simply do

tfds.load(
    colorectal_histology,
    split=["train[20%:]", "train[0%:10%]", "train[10%:20%"],
    as_supervised=True, 
    with_info=True)

Here the 1st argument in split train[10%:] will return the 90% of dataset as training, train[0%:10%] will return the 10% dataset from training as validation, and train[10%:20%] will return the other 10 percent as testing set. Though you can use the complete testing set, but if you want a split as 80,10,10 from training, this is what you can do.

Read more here

  • Related