I have an image that I read in with tifffile.imread
and it is turned into a 3D matrix, with the first dimension representing the Y coordinate, the second the X and the third the channel of the image (these images are not RGB and so there can be an arbitrary number of channels).
Each of these images has a label mask which is a 2D array that indicates the position of objects in the image. In the label mask, pixels that have a value of 0 do not belong to any object, pixels that have a value of 1 belong to the first object, pixels that have a value of 2 belong to the second object and so on.
What I would like to calculate is for each object and for each channel of the image I would like to know the mean, median, std, min and max of the channel. So, for example, I would like to know the mean, mediam std, min and max values of the first channel for pixels in object 10.
I have written code to do this but it is very slow (shown below) and I wondered if people had a better way or knew a package(s) that might be helpful i making this faster/doing this more efficiently. (Here the word 'stain' means the same as channel)
sample = imread(input_img)
label_mask = np.load(input_mask)
n_stains = sample.shape[2]
n_labels = np.max(label_mask)
#Create empty dataframe to store intensity measurements
intensity_measurements = pd.DataFrame(columns = ['sample', 'label', 'stain', 'mean', 'median', 'std', 'min', 'max'])
for label in range(1, n_labels 1):
for stain in range(n_stains):
#Extract stain and label
stain_label = sample[:,:,stain][label_mask == label]
#Calculate intensity measurements
mean = np.mean(stain_label)
median = np.median(stain_label)
std = np.std(stain_label)
min = np.min(stain_label)
max = np.max(stain_label)
#Add intensity measurements to dataframe
intensity_measurements = intensity_measurements.append({'sample' : args.input_img, 'label': label, 'stain': stain, 'mean': mean, 'median': median, 'std': std, 'min': min, 'max': max}, ignore_index=True)
CodePudding user response:
The proposed method below produces 2-D numpy arrays with relevant intensity measurements with shape=(n_stains, n_labels).
If it's crucial to further pack the data into a DataFrame as in the original code then please let me know:)
sample = imread(input_img)
label_mask = np.load(input_mask)
label_mask_unraveled = np.equal.outer(label_mask, np.arange(1, np.max(label_mask) 1))
sample_label_masks_applied = np.einsum("ijk,ijl->klij", sample, label_mask_unraveled)
intensity_mean = np.mean(sample_label_masks_applied, axis=(2, 3))
intensity_median = np.median(sample_label_masks_applied, axis=(2, 3))
intensity_std = np.std(sample_label_masks_applied, axis=(2, 3))
intensity_min = np.min(sample_label_masks_applied, axis=(2, 3))
intensity_max = np.max(sample_label_masks_applied, axis=(2, 3))
CodePudding user response:
Your code is slow because you iterate over the whole image for each of the labels. This is an operation of O(n k), for n pixels and k labels. You could instead iterate over the image, and for each pixel examine the label, then update the measurements for that label with the pixel values. This is an operation of O(n). You'd keep an accumulator for each label and each measurement (standard deviation requires accumulating the square sum as well as the sum, but the sum you're already accumulating for the mean). The only measure that you cannot compute this way is the median, as it requires a partial sort of the full list of values.
This would obviously be a much cheaper operation, except for the fact that Python is a slow, interpreted language, and looping over each pixel in Python leads to a very slow program. In a compiled language you would implement it this way though.
See this answer for a way to implement this efficiently using NumPy functionality.
Using the DIPlib library (disclosure: I'm an author) you can apply the operation as follows (the median is not implemented). Other image processing libraries have similar functionality, though might not be as flexible with the number of channels.
import diplib as dip
# sample = imread(input_img)
# label_mask = np.load(input_mask)
# Alternative random data so that I can run the code for testing:
sample = imageio.imread("../images/trui_c.tif")
label_mask = np.random.randint(0, 20, sample.shape[:2], dtype=np.uint32)
sample = dip.Image(sample, tensor_axis=2)
msr = dip.MeasurementTool.Measure(label_mask, sample, features=["Mean", "StandardDeviation", "MinVal", "MaxVal"])
print(msr)
This prints out:
| Mean | StandardDeviation | MinVal | MaxVal |
-- | ------------------------------------ | ------------------------------------ | ------------------------------------ | ------------------------------------ |
| chan0 | chan1 | chan2 | chan0 | chan1 | chan2 | chan0 | chan1 | chan2 | chan0 | chan1 | chan2 |
| | | | | | | | | | | | |
-- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
1 | 82.26 | 41.30 | 24.77 | 57.77 | 52.16 | 48.22 | 5.000 | 3.000 | 1.000 | 255.0 | 255.0 | 255.0 |
2 | 82.02 | 41.18 | 24.85 | 52.16 | 48.22 | 48.33 | 3.000 | 1.000 | 1.000 | 255.0 | 255.0 | 255.0 |
3 | 82.39 | 41.17 | 24.93 | 48.22 | 48.33 | 48.48 | 1.000 | 1.000 | 1.000 | 255.0 | 255.0 | 255.0 |
4 | 82.14 | 41.62 | 25.03 | 48.33 | 48.48 | 48.47 | 1.000 | 1.000 | 0.000 | 255.0 | 255.0 | 255.0 |
5 | 82.89 | 41.45 | 24.94 | 48.48 | 48.47 | 48.54 | 1.000 | 0.000 | 1.000 | 255.0 | 255.0 | 255.0 |
6 | 82.83 | 41.60 | 25.26 | 48.47 | 48.54 | 48.65 | 0.000 | 1.000 | 1.000 | 255.0 | 255.0 | 255.0 |
7 | 81.95 | 41.77 | 25.51 | 48.54 | 48.65 | 48.22 | 1.000 | 1.000 | 2.000 | 255.0 | 255.0 | 255.0 |
8 | 82.93 | 41.36 | 25.19 | 48.65 | 48.22 | 48.11 | 1.000 | 2.000 | 1.000 | 255.0 | 255.0 | 255.0 |
9 | 81.88 | 41.70 | 25.07 | 48.22 | 48.11 | 47.69 | 2.000 | 1.000 | 1.000 | 255.0 | 255.0 | 255.0 |
10 | 81.46 | 41.40 | 24.82 | 48.11 | 47.69 | 48.32 | 1.000 | 1.000 | 2.000 | 255.0 | 255.0 | 255.0 |
11 | 81.33 | 40.98 | 24.76 | 47.69 | 48.32 | 48.85 | 1.000 | 2.000 | 1.000 | 255.0 | 255.0 | 255.0 |
12 | 82.30 | 41.55 | 25.12 | 48.32 | 48.85 | 48.75 | 2.000 | 1.000 | 1.000 | 255.0 | 255.0 | 255.0 |
13 | 82.43 | 41.50 | 25.15 | 48.85 | 48.75 | 48.89 | 1.000 | 1.000 | 1.000 | 255.0 | 255.0 | 255.0 |
14 | 83.29 | 42.11 | 25.65 | 48.75 | 48.89 | 48.32 | 1.000 | 1.000 | 1.000 | 255.0 | 255.0 | 255.0 |
15 | 83.20 | 41.64 | 25.28 | 48.89 | 48.32 | 48.13 | 1.000 | 1.000 | 1.000 | 255.0 | 255.0 | 255.0 |
16 | 81.51 | 40.92 | 24.76 | 48.32 | 48.13 | 48.73 | 1.000 | 1.000 | 1.000 | 255.0 | 255.0 | 255.0 |
17 | 81.81 | 41.31 | 24.71 | 48.13 | 48.73 | 48.49 | 1.000 | 1.000 | 0.000 | 255.0 | 255.0 | 255.0 |
18 | 83.58 | 41.85 | 25.25 | 48.73 | 48.49 | 32.20 | 1.000 | 0.000 | 1.000 | 255.0 | 255.0 | 212.0 |
19 | 82.12 | 41.24 | 25.06 | 48.49 | 32.20 | 24.44 | 0.000 | 1.000 | 1.000 | 255.0 | 212.0 | 145.0 |
I don't have an efficient solution for the median. You'd have to split the image into a separate array for each label, then run the median over that. This would be equally efficient as the above, but use up much more memory.