Home > Blockchain >  What is 3 in numpy.resize(image,(IMG_HEIGHT,IMG_WIDTH,3))?
What is 3 in numpy.resize(image,(IMG_HEIGHT,IMG_WIDTH,3))?

Time:09-18

While trying to build a letter classifier in ML, this was a code for creating image data and the labels from the images from a folder using PIL.

def create_dataset_PIL(img_folder):

img_data_array=[]
class_name=[]
for dir1 in os.listdir(img_folder):
    print(dir1)
    for file in os.listdir(os.path.join(img_folder, dir1)):       
        image_path= os.path.join(img_folder, dir1,  file)
        image= np.array(Image.open(image_path))
        image= np.resize(image,(IMG_HEIGHT,IMG_WIDTH,3))
        image = image.astype('float32')
        image /= 255  
        img_data_array.append(image)
        class_name.append(dir1)
return img_data_array , class_name

Each image is 32 X 32 pixels in the dataset already and I am resizing it to a list of 32 X 32 X 3 dimension. But I don't understand, what is this 3rd dimension when all I need is 32 X 32 pixels?

I stumbled upon Numpy Resize/Rescale Image where I learned this may be interpolation parameter. Also from YouTube, I learned that interpolation is required while resizing images. But I don't know what to do with this extra data? Should size of input layer of my Neural Network be now 32 X 32 X 3 instead of just 32 X 32?

CodePudding user response:

3 represent the RGB (RED-GREEN-BLUE) values. Each pixel of the image represented by 3 pixels instead of one. In a black&white image, each pixel would be represented by [pixel], In RGB image each pixel would be represented by [pixel(R),pixel(G),pixel(B)]

In fact, each pixel of the image has 3 RGB values. These range between 0 and 255 and represent the intensity of Red, Green, and Blue. A lower value stands for higher intensity and a higher value for lower intensity. For instance, one pixel can be represented as a list of these three values [ 78, 136, 60]. Black would represented as [0, 0, 0].

And yes: Your input layer should match this 32X32X3.

CodePudding user response:

3'rd dimension in Digital image contains information about color present on pixel at (x,y)coordinate in the image, also called as color channel.

Most common channel types

  • RGB mode: if value is 3
    for example: image_shape: [32,32,3]
  • Gray scale mode: if value is 1 for example: image_shape: [32,32,1]

If your ML model doesn't need colour feature you can use Scikit-image to convert into grayscale through rgb2gray

you can learn more about image usage in NumPy here

  • Related