I am new to the C concept of streams and want to ask for some general advice to speed up my code in image processing. I use a stream buffer boost::iostreams::filtering_streambuf
to load and decompress the image from a file, as suggested in this post and another post. The performance is not satisfactory.
The relavent code is the following:
template <int _NCH>
class MultiChImg {
public:
...
...
private:
std::atomic<bool> __in_operation;
std::atomic<bool> __content_loaded;
char **_IMG[_NCH];
int _W, _H;
void dcmprss ( const std::string & file_name, bool is_decomp = true) {
...
...
// decompress
int counter = 0, iw = -1, ih = -1, _r = 0;
auto _fill_ = [&](const char c){
_r = counter % _NCH ; // was 3 for RGB mindset
if ( _r == 0 ) {
iw ; // fast index
if ( iw%_W==0 ) { iw=0; ih ; } // slow index
}
_IMG[_r][_H-1-ih][iw] = c;
counter ;
} ;
auto EoS = std::istreambuf_iterator<char>() ;
// char buf[4096]; // UPDATE: improved code according to @sehe
if ( is_decomp ) {
// decompress
bio::filtering_streambuf<bio::input> input;
input.push( bio::gzip_decompressor() ); //
input.push( fstrm );
std::basic_istream<char> inflated( &input );
auto T3 = timing(T2, "Timing : dcmprss() prepare decomp ") ;
// assign values to _IMG (0=>R, 1=>G, 2=>B)
// TODO // bottleneck
std::for_each(
std::istreambuf_iterator<char>(inflated), EoS, _fill_ );
// UPDATE: improved code according to @sehe , replace the previous two lines
// while (inflated.read(buf, sizeof(buf)))
// std::for_each(buf, buf inflated.gcount(), _fill_);
auto T4 = timing(T3, "Timing : dcmprss() decomp assign ") ;
} else {
// assign values to _IMG (0=>R, 1=>G, 2=>B)
// TODO // bottleneck
std::for_each(
std::istreambuf_iterator<char>(fstrm), EoS, _fill_ ); // different !
// UPDATE: improved code according to @sehe , replace the previous two lines
// while (fstrm.read(buf, sizeof(buf)))
// std::for_each(buf, buf fstrm.gcount(), _fill_);
auto T3 = timing(T2, "Timing : dcmprss() assign ") ;
}
assert(counter == _NCH*_H*_W);
...
...
};
...
...
}
The bottleneck appears to be the for_each()
part, where I iterate the stream, either inflated
via std::istreambuf_iterator<char>(inflated)
, or fstrm
via std::istreambuf_iterator<char>(fstrm)
, to apply a lambda function _fill_
. This lambda function transfers the bytes in the stream to the designated place in the multi-dimensional array class member _IMG
.
UPDATE: the timing was incorrect due to memory leakage. I've corrected that.
The timing results of the above function dcmprss()
are 450ms for a .gz file of 30MB size, 400ms for uncompressed file. I think it takes too long. So I am asking the community for some kind advice to improve.
Thanks for your time on my post!
CodePudding user response:
You can use blockwise IO
char buf[4096];
inflated.read(buf, sizeof(buf));
std::for_each(buf, buf inflated.gcount(), _fill_);
However, I also think considerable time might be wasted in _fill_
where some dimensions are reshaped. That feels arbitrary.
Note that several libraries have the features to transparently re-index multi-dimensional data, so you may potentially save time just linearly copy the source data and accessing that:
- Boost MultiArray (allows you to specify storage order, direction and offsets: https://www.boost.org/doc/libs/1_79_0/libs/multi_array/doc/user.html#sec_storage
- Boost GIL allows you to use image data directly from interleaved/planar buffers: https://www.boost.org/doc/libs/1_79_0/libs/gil/doc/html/design/dynamic_image.html