I am writing a software which does a lot of image operations / compositing on a lot of (potential big) images.
Multi threading helps a lot speed wise, but QT does not allow using multiple QPainter on the same image at the same time.
So i would have to do my image operation / compositing in each thread on a copy and then blit it back, which reduces the performance by a lot (depending on the use case obviously).
So i came up with an idea which seem to work, but feels extremely hacky.
I get the target image data (QImage::bits
) pointer, and provide this to the worker thread.
In the worker thread i recreate a new QImage from the provided pointer. This means, no copy, no blitting. It seem to work fine, as long as i make sure each pixel/tile is only worked on in one thread and i don't detach the target image.
My question is: Is this safe and are there any other issues that could arise from this approach ?
Example code
QImage source = ...;
QImage target = ...;
QPainter::CompositionMode compositionMode = QPainter::CompositionMode_SourceOver;
// calculate tiles
QList<QRect> tiles;
for(int y = rect.top(); y < rect.top() rect.height(); y = tileSize){
for(int x = rect.left(); x < rect.left() rect.width(); x = tileSize){
QRect tile(
x, y,
x tileSize > rect.left() rect.width() ? rect.left() rect.width() - x : tileSize,
y tileSize > rect.top() rect.height() ? rect.top() rect.height() - y : tileSize
);
tiles.append(tile);
}
}
// Get target pixel pointer and do threaded operation on each tile
uchar *targetPix = target.bits();
auto target_size = target.size();
auto targetFormat = target.format();
QList<int> lol = QtConcurrent::blockingMapped(tiles, [&target_size, &targetFormat, &source, targetPix, &compositionMode](const QRect &r){
QImage tile_target(targetPix, target_size.width(), target_size.height(), targetFormat);
QPainter p(&tile_target);
p.setCompositionMode(compositionMode);
// do you image operations here. For now we just do a simple draw
p.drawImage(r.topLeft(), source, r);
return 1; // In reallity this would return sensible data ;)
});
(This example increased the speed in my test by about 4.6 times btw. Depends on the operation and system of course.)
CodePudding user response:
Short answer
This is indeed tricky (but that's often needed when you want bleeding edge performance), but it should be (possible to make it) thread safe (for certain operations). Off course this depends on the operations you perform on the tile_target
.
It's up to you that you don't even access the bits outside the assigned tile (i.e. the portion of tile_target
outside the rect r
).
Some considerations
Ensure you only access the bits of the assigned tile
As tile_target
refers the whole image, it's up to you to ensure you don't access bits outside this target tile. Some problematic cases:
- anti-aliasing: you will probably alter bits of neighboring while performing such an operation
- Filtering: operations should as blurring often read the neighboring bits to calculate an 'averaged' value. Reading is not safe either if another thread may be writing the same bit simultaneously.
Should I worry about detaching?
That shouldn't be an issue with your current implementation. QImage::bits
already detaches (if needed) the image (target
) from any other possibly existing copy. As you perform the concurrent operation by blocking the calling thread. The original image (target
) will exist at least as long as the tile_target
images exist.
Safer approach
Use a library that is dedicated to multi-threaded image processing, or at least allow referencing a subimage.
Pass a copy of the tile (see
QImage::copy
) to each thread and write the result back in the original image (either using mutexes or by performing this operation in the calling thread). Depending on the calculation insensitivity of the image operation, this extra copy may or may not be negligible. For the OP, this doesn't seem a viable option.
Note that those safer approaches may generate (slightly) different results from the single threaded result in case of anti-aliasing or filtering. Those artifacts may be minimized by taking the tiles as large as possible (i.e. create not more tiles than the number of threads your cpu supports)