I am toying around with using python to apply various image-kernels to images; I am using sklearn.feature_extraction to create the patches, however, when I do so, it appears some of the data is missing, which will cause problems when I go back to reconstruct the image. Am I doing something wrong, or do I have to add a buffer around the image when grabbing the patches for border cases?
from PIL import Image
sklearn.feature_extraction import image
import numpy as np
img = Image.open('a.png')
arr = np.array(img)
patches = imagePatchExtractor(patch_size=(3,3)).fit(arr).transform(arr)
>>>arr.shape
(1080, 1080, 3)
>>>patches.shape
(1164240, 3, 3)
>>>1164240/1080
1078.0
CodePudding user response:
There are two things to understand here:
image.PatchExtractor
extracts all possible patches with strides of 1 in each dimension. For example, with patches of shape(3, 3)
you will getarr[0:3, 0:3, 0]
, thenarr[1:4, 1:4, 0]
, and so on. Hence, in general, for a patch size of(x, y)
and image size of(w, h)
you will get(w-x 1)*(h-y 1)
many patches for each channel. The-x 1
and-y 1
is due to the patch hitting the image boundaries (there is no padding).PatchExtractor.transform()
expects the first dimension to be then_samples
. So, in your case the shape should be(1, 1080, 1080, 3)
.
Putting all this together, here is an example with a fake smaller image with one channel:
from sklearn.feature_extraction import image
import numpy as np
# Adding the n_samples dimension with reshape.
arr = np.arange(0, 6*6*1).reshape((1, 6, 6))
print(arr)
array([[[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]]])
# Get all possible patches.
patches = image.PatchExtractor(patch_size=(3, 3)).fit(arr).transform(arr)
print(np.shape(patches))
print(patches[0, :])
print(patches[1, :])
shape:
# (6-3 1) * (6-3 1) = 16
(16, 3, 3)
patches[0, :]:
array([[ 0., 1., 2.],
[ 6., 7., 8.],
[12., 13., 14.]])
patches[1, :]:
array([[ 1., 2., 3.],
[ 7., 8., 9.],
[13., 14., 15.]])
As you can see, the result matches the explanation above. Patch 1 is displaced by one pixel to the right with respect to patch 2.
Hence, in your case with an image of shape (1080, 1080, 3)
:
# You also need this reshape to add the n_samples dimension.
arr = np.arange(0, 1080*1080*3).reshape((1, 1080, 1080, 3))
patches = image.PatchExtractor(patch_size=(3, 3)).fit(arr).transform(arr)
print(np.shape(patches))
# (1080-3 1)*(1080-3 1) = 1162084
(1162084, 3, 3, 3)
EDIT - patches with padding:
If you want to have the same number of patches for each pixel you could pad the image using np.pad()
. Note that by default it pads all the axes, so we need to manually specify the pad amounts per-axis:
# Padding amount for each axis. Here: amount should be patch_size-1.
# Here, the format is (pad_before, pad_after) for each dimension.
paddings = ((0, 0), (1, 1), (1, 1), (0, 0))
wrapped_arr = np.pad(arr, pad_width=paddings, mode='wrap')
wrapped_patches = image.PatchExtractor(patch_size=(3, 3)).fit(wrapped_arr).transform(wrapped_arr)
print(np.shape(wrapped_patches))
# 1080*1080 = 1166400
(1166400, 3, 3, 3)