so I am asked to implement the split function parameter: 80% train, 10% validation, and 10% test. And I do not understand how to do it here. Please help. Thanks.
def plot_example(x_raw, y_raw):
fig, axes = plt.subplots(3, 3)
i = 0
for i in range(3):
for j in range(3):
imgplot = axes[i,j].imshow(x_raw[i*3 j], cmap = 'bone')
axes[i,j].set_title(y_raw[i*3 j])
axes[i,j].get_yaxis().set_visible(False)
axes[i,j].get_xaxis().set_visible(False)
fig.set_size_inches(18.5, 10.5, forward=True)
## TODO: Implement the split function parameter: 80% train, 10% validation, and 10% test.
(ds_train, ds_val, ds_test), ds_info = tfds.load("colorectal_histology",
split=[],
as_supervised=True, with_info=True)
df = tfds.as_dataframe(ds_train.shuffle(1000).take(1000), ds_info)
plot_example(df['image'], df['label'])
print(ds_info)
Please explain
CodePudding user response:
The tfds.load
has the argument of split. You can use this argument to load the dataset in your desired format. If you want 80% train, 10% val, 10% test, you can simply do
tfds.load(
colorectal_histology,
split=["train[20%:]", "train[0%:10%]", "train[10%:20%"],
as_supervised=True,
with_info=True)
Here the 1st argument in split train[10%:]
will return the 90% of dataset as training, train[0%:10%]
will return the 10% dataset from training as validation, and train[10%:20%] will return the other 10 percent as testing set. Though you can use the complete testing set, but if you want a split as 80,10,10 from training, this is what you can do.
Read more here