I was trying to create a neural network to distinguish forest from other land in satellite images.
I started analysing the images but I'm not sure not sure how to normalize the pixel values.
I thought to divide each pixel value by 255 but in an example made by bnsreenu i found this part
from sklearn.preprocessing import MinMaxScaler, StandardScaler
scaler = MinMaxScaler()
root_directory = 'Semantic segmentation dataset/'
patch_size = 256
#Read images from repsective 'images' subdirectory
#As all images are of ddifferent size we have 2 options, either resize or crop
#But, some images are too large and some small. Resizing will change the size of real objects.
#Therefore, we will crop them to a nearest size divisible by 256 and then
#divide all images into patches of 256x256x3.
image_dataset = []
for path, subdirs, files in os.walk(root_directory):
#print(path)
dirname = path.split(os.path.sep)[-1]
if dirname == 'images': #Find all 'images' directories
images = os.listdir(path) #List of all image names in this subdirectory
for i, image_name in enumerate(images):
if image_name.endswith(".jpg"): #Only read jpg images...
image = cv2.imread(path "/" image_name, 1) #Read each image as BGR
SIZE_X = (image.shape[1]//patch_size)*patch_size #Nearest size divisible by our patch size
SIZE_Y = (image.shape[0]//patch_size)*patch_size #Nearest size divisible by our patch size
image = Image.fromarray(image)
image = image.crop((0 ,0, SIZE_X, SIZE_Y)) #Crop from top left corner
#image = image.resize((SIZE_X, SIZE_Y)) #Try not to resize for semantic segmentation
image = np.array(image)
#Extract patches from each image
print("Now patchifying image:", path "/" image_name)
patches_img = patchify(image, (patch_size, patch_size, 3), step=patch_size) #Step=256 for 256 patches means no overlap
for i in range(patches_img.shape[0]):
for j in range(patches_img.shape[1]):
single_patch_img = patches_img[i,j,:,:]
#Use minmaxscaler instead of just dividing by 255.
single_patch_img = scaler.fit_transform(single_patch_img.reshape(-1, single_patch_img.shape[-1])).reshape(single_patch_img.shape)
#single_patch_img = (single_patch_img.astype('float32')) / 255.
single_patch_img = single_patch_img[0] #Drop the extra unecessary dimension that patchify adds.
image_dataset.append(single_patch_img)
In this example he uses a minmaxscaler that give different values compared as diving by 255. What method is better or more adapt to the situation? I'll leave the link below:
CodePudding user response:
MinMaxScaler indeed may produce different values rather than simple division by 255 (In case, there are no pixels with intensities 0 or 255). As official scikit-learn documentation say, it performs the following transformation:
X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
X_scaled = X_std * (max - min) min
where max, min - desirable values.
Therefore, normalalizing data is rather data (and probably model) specific operation. The division by 255 is a most common way to do so and for the many cases it's enough to do. As you use neural network, you can check answers to this question to learn more about why you should normilize/center your data.