Home > Software engineering >  How should the data folder be to take input as (x-train, y-train), (x-test, y-test) in a cnn model
How should the data folder be to take input as (x-train, y-train), (x-test, y-test) in a cnn model

Time:02-14

I am new to machine learning and deep learning. I have tried a multi-class classification model using CNN algorithm. I first tried it using the CIFAR-10 data set which is provided by the keras. In there, we give the input as follows to load the data set,

(x-train, y-train), (x-test, y-test) = tf.keras.datasets.cifar10.load_data()

This worked for me. However, I am now trying it with my manual data set instead of the built-in data set. I don't know how the data set folder should be, and even how to access it.

For now, I have a data set folder arrangement as follows,

Dataset=> Training_set => 10 different classes folders with corresponding images within
          Test_set => 10 different classes folders with corresponding images within

I have no idea how to use it in the code while calling the load_data. If I use it in the normal way, like, flow_from_directory('../Dataset/Training_set') I get the following error - Too many values to unpack(expected 2). Kindly help with this issue. It would be a great help for learning.

CodePudding user response:

Try using tf.keras.preprocessing.image_dataset_from_directory('../Dataset/Training_set') instead of flow_from_directory('../Dataset/Training_set')

CodePudding user response:

You can try using tf.keras.utils.image_dataset_from_directory.

Create dummy data:

import os
import numpy
from PIL import Image

os.mkdir('Training_set')
for i in range(10):
  os.mkdir('Training_set/class{}'.format(i))

for i in range(10):
  for j in range(2):
    imarray = numpy.random.rand(100,100,3) * 255
    im = Image.fromarray(imarray.astype('uint8')).convert('RGB')
    im.save('Training_set/class{}/result_image{}.png'.format(i, j))

Folder structure:

- Training_set/
    - class9/
        - result_image1.png
        - result_image0.png
    - class8/
        - result_image1.png
        - result_image0.png
    - class7/
        - result_image1.png
        - result_image0.png
    - class0/
        - result_image1.png
        - result_image0.png
    - class2/
        - result_image1.png
        - result_image0.png
    - class5/
        - result_image1.png
        - result_image0.png
    - class4/
        - result_image1.png
        - result_image0.png
    - class3/
        - result_image1.png
        - result_image0.png
    - class1/
        - result_image1.png
        - result_image0.png
    - class6/
        - result_image1.png
        - result_image0.png

Load data:

import tensorflow as tf

train_ds = tf.keras.utils.image_dataset_from_directory(
  'Training_set',
  validation_split=0.2,
  subset="training",
  seed=123,
  image_size=(100, 100),
  batch_size=2)

for x, y in train_ds.take(1):
  print(x.shape, y.shape)
Found 20 files belonging to 10 classes.
Using 16 files for training.
(2, 100, 100, 3) (2,)

You can also choose if you want the labels to be sparse or categorical. See the docs for more information.

  • Related