I need some help with parallel programming in MATLAB. To be clear, I have never implemented parallelization techniques in any of my codes before. I have a video compression engine, developed as part of my university project. It is a basic verion of H.264 video compression engine. I have to implement the parallel proceesing techniques available in MATALB to this engine. Basically, I have a function which divides an image frame into a number of blocks (predtermined by the size of the block). I'm trying to partially or fully parallelize this block of the code. I have used "parfor" when there was no dependency between the blocks, and this worked out well. I have uploaded this implementation. Now I'm trying to parallalize a case were there are dependencies between blocks.
function [reconstructed_frames, residual_blocks, encoded_data_cell, bit_count_coeff_per_frame, bit_count_mv_per_frame_cell, real_avg_bit_count_per_row_per_frame, total_bit_count_per_frame, QP_used_in_row, scene_change_frames, SAD_value_per_frame] = block_prediction_parallalized(Y, block_size, srch_rng, QP, I_period,pathToResiduals, no_ref_frames, VBS_enable, Fast_ME_enable,Frac_ME_enable,lambda, RC_flag, avg_bit_count_row_vary_QP, target_bits_per_frame)
%Function to predcit frames based on inter prediction and intra prediction,
%with the given I-period
Y = int64(Y);
[no_rows, no_cols, no_frames] = size(Y);
no_blocks_in_row = (no_cols*block_size)/(block_size*block_size);
no_blocks_in_col = (no_rows*block_size)/(block_size*block_size);
total_blocks_per_frame = (no_rows*no_cols)/(block_size*block_size);
encoded_data_cell = cell(1,total_blocks_per_frame,no_frames);
encoded_data_per_frame = cell(1, total_blocks_per_frame);
ref_frame_inter = zeros(no_rows, no_cols, 1, 'int64') 128;
bit_count_coeff_per_frame = 0;
bit_count_mv_per_frame_cell = 0;
real_avg_bit_count_per_row_per_frame = 0;
QP_used_in_row = zeros(1,no_blocks_in_col,no_frames);
QP_used_in_row(:,:,:) = QP;
scene_change_frames = [];
SAD_value_per_frame = 0;
ref_frame_index_count = 1;
for k = 1:no_frames
if k>1
ref_frame_inter(:,:,1) = Y(:,:,k-1);
end
block_segment = 0;
bitCountMV = 0;
for row = 1 : block_size : no_rows - block_size 1
for col = 1 : block_size : no_cols - block_size 1
block_segment = block_segment 1;
row_start = row;
row_end = row_start block_size - 1;
col_start = col;
col_end = col_start block_size - 1;
row_end = min(row_end, no_rows);
col_end = min(col_end, no_cols);
% Making an array of blocks of size block_size
block_list_currframe(:,:,block_segment) = Y(row_start:row_end, col_start:col_end, k);
location_pointers(block_segment,:) = [row_start row_end col_start col_end];
end
end
%Parallelizing the block encoding process
max_index = size(block_list_currframe,3);
%Loop for processing blocks concurrently
parfor block_index = 1:max_index
% Funtion for inter-prediction
[encoded_data, reconstructed_block, residual_block, bit_count_per_block] = paral_debug_funct(block_index, location_pointers, block_list_currframe, ref_frame_inter, block_size, srch_rng, QP, no_rows, no_cols, ref_frame_index_count, VBS_enable, Fast_ME_enable, Frac_ME_enable, lambda);
%Buffering the output of each worker
reconstructed_blocks(:,:,block_index) = reconstructed_block;
residual_blocks_in_frame(:,:,block_index) = residual_block;
encoded_data_per_frame(:,:, block_index) = encoded_data;
total_bit_count_per_block(block_index) = bit_count_per_block;
end
%Processing the buffered outputs obtained after processing all the
%blocks.
for block_index = 1:size(block_list_currframe,3)
% [row_start, row_end, col_start, col_end] = location_pointers(block_index,:);
row_start = location_pointers(block_index, 1);
row_end = location_pointers(block_index, 2);
col_start = location_pointers(block_index, 3);
col_end = location_pointers(block_index, 4);
reconstructed_frames(row_start:row_end, col_start:col_end, k) = reconstructed_blocks(:,:,block_index);
residual_blocks(:,:,block_index,k) = residual_blocks_in_frame(:,:,block_index);
encoded_data_cell(:,:,block_index,k) = encoded_data_per_frame(:,:,block_index);
end
total_bit_count_per_frame(k) = sum(total_bit_count_per_block, 'all');
end
In the above code, the blocks dont have to communicate with each other. Now, I require them to communicate with each other at some point, as the processing of some blocms will have to wait for a previous block to finish. I think the image below will help make it clearer.
I have come to know that there are two type of parallel processing available, multi-threading and multi-processing. I think multi-threading is what is apt for my use case. I have read about spmd and parfeval but, the examples I've come across are usually not very detailed. As I am new to parallel processing, these options feel very confusing and it is difficult to choose which one to focus on. I think what I want is that the workers to be able to communicate with each other during exection?, I'm not sure. If you need a general idea of the data size: video_frame size = 288x352(CIF format) block size = 16 no of frames = 21
Thank you!
P.S Sorry for the long post, I was trying to explain it as clearly as possible
CodePudding user response:
You can use a parfor
inside a non parallel for
, something like this:
previous_blocks = {};
for color : ["green", "red", "blue"]
input_blocks = extract cell array of blocks with same color from the image
processed_blocks = cell(1, numel(input_blocks));
parfor i=1:numel(input_blocks)
processed_blocks{i} = process_based_on_previous_blocks (i, input_blocks{i}, previous_blocks);
end
previous_blocks = processed_blocks;
place processed_blocks in their original position in the image;
end