Home > Mobile >  Is there an alternative to tf.keras.utils.image_dataset_from_directory if images are not organized i
Is there an alternative to tf.keras.utils.image_dataset_from_directory if images are not organized i

Time:08-24

I want to train an image classification network.

I have all images in one folder and a .json file with labels and a lot of meta data. I wrote a couple of functions to extract the images which correspond to the classes I want to train for, shuffle them and randomly split them into a train- and a val-list. So currently I have something like this:

list_imagepath_train = [' C:\Users\someuser\Pictures\randomimagename1.jpg', ' C:\Users\someuser\Pictures\randomimagename2.jpg', ' C:\Users\someuser\Pictures\randomimagename5.jpg', ' C:\Users\someuser\Pictures\randomimagename8.jpg', ' C:\Users\someuser\Pictures\randomimagename9.jpg', ' C:\Users\someuser\Pictures\randomimagename10.jpg', ' C:\Users\someuser\Pictures\randomimagename12.jpg']

list_corresponding_classlabels_train = ['5', '5', '2', '3', '2', '2', '5']
    
list_imagepath_val = [' C:\Users\someuser\Pictures\randomimagename3.jpg', ' C:\Users\someuser\Pictures\randomimagename4.jpg', ' C:\Users\someuser\Pictures\randomimagename6.jpg', ' C:\Users\someuser\Pictures\randomimagename7.jpg']
    
list_corresponding_classlabels_val = ['2', '3', '5', '2']

And now I want to convert those lists to a train- and a val-dataset to use in Tensorflow. The thing is that I can't use tf.keras.utils.image_dataset_from_directory because alle images, independent of their label, are in the same folder and it seems a bit pointless to me to move them around every time I start a new training. tf.keras.preprocessing.image.ImageDataGenerator is deprecated (https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator) and now I am not sure which function to use to convert the lists into the needed dataset. This is all very new to me, so any input or hint is very welcome!

CodePudding user response:

You could use ImageDataGenerator() class and then its function flow_from_dataframe(); so before create dataframe with two columns; x_col as files path and y_col as the labels, then define parameters in flow_from_dataframe() and for this case set directory to None.

Also you could use os library and create dirs and subdirs and then move images based on the labels into its subdirs; if you wanna only use image_data_from_directory()

And even tf.data.Dataset.from_tensor_slices((path_list, label_list)) and then map it by a function which load image by tf.io.read_file() and then decode image, resize and reshape and then return both image and label. then batch and shuffle after map the dataset.

CodePudding user response:

If you don't want to use ImageDataGenerator

from tensorflow.keras.preprocessing.image import load_img, img_to_array


def make_dataset(x, y):
    imgs = []
    labels = []
    img_size = # set to whatever you need

    for i, j in zip(x, y):
        img = load_img(i, target_size=img_size)
        img = img_to_array(img)
        imgs.append(img)
        labels.append(j)
    imgs, labels = np.array(imgs), np.array(labels)
    return imgs, labels


x_train, y_train = make_dataset(list_imagepath_train, list_corresponding_classlabels_train)
x_val, y_val = make_dataset(list_imagepath_val, list_corresponding_classlabels_val)
model.fit(x_train, y_train, ...)
  • Related