I am working with hundreds of large high resolution .tif images that are very memory intensive to read in Python. Fortunately, I can often work with low resolution versions of these images by downsampling them after loading them. I am wondering if there is a way to only read part of the image into memory instead of the whole image to improve read speed.
The code below shows an example of what I would like, however, this still reads the whole image into memory before returning the downsampled array. Is it possible to only read every nth pixel values into memory to improve read speed?
from tifffile import imread
def standardOpen(f):
im = imread(f)
return(im)
def scaledOpen(f):
im = imread(f)[::,::4,::4]
return(im)
f_path = '/file_name.tif'
im = standardOpen(f_path)
print(im.shape)
>>(88, 2048, 2048)
im_scaled = scaledOpen(f_path)
print(im_scaled.shape)
>>(88, 512, 512)
EDIT: I have uploaded a sample image to dropbox: https://www.dropbox.com/s/xkm0bzudcv2sw5d/S000_t000002_V000_R0000_X000_Y000_C02_I1_D0_P00101.tif?dl=0
This image is 101 slice of 2048x2048 pixels. When I read it using tifffile.imread(image_path) I get a numpy array of shape (101, 2048, 2048)
CodePudding user response:
The sample file, S000_t000002_V000_R0000_X000_Y000_C02_I1_D0_P00101.tif
, is a multi-page TIFF. The image data in each page is stored uncompressed in one strip. To speed up reading sliced data from this specific kind of TIFF file, memory-map the frame data and copy the sliced data to a pre-allocated array while iterating over the pages in the file. Unless one wants to preserve noise characteristics, it is usually better to downsample using higher order filtering, e.g interpolation using OpenCV:
import numpy
import tifffile
import cv2 # OpenCV for fast interpolation
filename = 'S000_t000002_V000_R0000_X000_Y000_C02_I1_D0_P00101.tif'
with tifffile.Timer():
stack = tifffile.imread(filename)[:, ::4, ::4].copy()
with tifffile.Timer():
with tifffile.TiffFile(filename) as tif:
page = tif.pages[0]
shape = len(tif.pages), page.imagelength // 4, page.imagewidth // 4
stack = numpy.empty(shape, page.dtype)
for i, page in enumerate(tif.pages):
stack[i] = page.asarray(out='memmap')[::4, ::4]
# # better use interpolation instead:
# stack[i] = cv2.resize(
# page.asarray(),
# dsize=(shape[2], shape[1]),
# interpolation=cv2.INTER_LINEAR,
# )
I would avoid this kind of micro-optimization for little speed gain. The image data in the sample file is only ~800 MB and easily fits into RAM on most computers.
CodePudding user response:
I did some experiments with pyvips to simulate your workflow.
To get started, I created a 6.7GB TIF with dimensions 60,000 x 40,000 pixels. Then I loaded it with pyvips
and shrank it to fit within a 1,000 x 1,000 rectangle and saved the result:
#!/usr/bin/env python3
import pyvips
# Resize to no more than 1000x1000 pixels
out = pyvips.Image.thumbnail('big.tif', 1000)
# Save with LZW compression
out.tiffsave('result.tif', tile=True, compression='lzw')
That took 3 seconds and used 440MB of RAM - including the Python interpreter. You can then make that into a regular Numpy array like this - it is actually just one line of code - there is just some mapping:
# map vips formats to np dtypes
format_to_dtype = {
'uchar': np.uint8,
'char': np.int8,
'ushort': np.uint16,
'short': np.int16,
'uint': np.uint32,
'int': np.int32,
'float': np.float32,
'double': np.float64,
'complex': np.complex64,
'dpcomplex': np.complex128,
}
# vips image to numpy array
def vips2numpy(vi):
return np.ndarray(buffer=vi.write_to_memory(),
dtype=format_to_dtype[vi.format],
shape=[vi.height, vi.width, vi.bands])
# Do actual conversion from vips image to Numpy array
na = vips2numpy(out)
You can do the same in Terminal, with vipsthumbnail by the way:
vipsthumbnail big.tif result.tif --size=1000 --vips-leak
memory: high-water mark 372.78 MB