Home > database >  Reshaping x-y data into sub-arrays based on conditions set for x
Reshaping x-y data into sub-arrays based on conditions set for x

Time:09-28

I have two numerical arrays of 6538x1 data, let's say diameter and weight. I would like to create multiple sub-arrays whenever diameter falls within a desired range. For example, my first sub-array should include diameter within a range of 10^0.65 to 10^0.70 and the corresponding weight values. The second sub-array should include diameter within a range of 10^0.70 to 10^0.75 and the corresponding weight values and so on. I would then like to find the mean values of diameter and weight from each sub-array and create a plot that shows correlations between the two.

I have tried creating a logical index and then finding average values of diameter that fall in the desired range. For example, the lines below can generate average values of diameter for my first desired sub-array but how can I get the corresponding values of weight and calculate the mean for that?

subarray1 = diameter <10^0.7 & diameter>=10^0.65;
meandiameter1 = mean(diameter(subarray1)); 

CodePudding user response:

If the diameter ranges are defined by a set of edges that span the full set of data, this can be easily done with

  • histcounts to classify the diameters in ranges, and then
  • accumarray to apply a function to the diameters or to the weights for each range of diameters.
diameters = [4 7 2 8 7 9 6 4 6 8 4 3];
weights = [10 40 50 40 30 60 70 80 90 40 20 60];
edges = [2 4 6 8 10]; % first range is [edges(1), edges(2)),
                      % second is [edges(2), edges(3)), ...,
                      % last is [edges(end), inf]
[~, ~, ind] = histcounts(diameters, [edges inf]);
result_diameters = accumarray(ind(:), diameters(:), size(edges(:)), @mean, NaN);
result_weights = accumarray(ind(:), weights(:), size(edges(:)), @mean, NaN);

For example, this gives

result_weights =
  55.000000000000000
  36.666666666666664
  57.500000000000000
  46.666666666666664
                 NaN

result_weights(1) is 55, which is the average weight for the two values with diameter in the range [2, 4), namely the third and last data values. result_weights(5) is NaN because there are no values in the range [10, inf].

CodePudding user response:

subarray1 is a logical array telling you which elements to select that fulfil your conditions. Just like you use logical indexing with diameter to select the diameters that are in the range, you can use it with your weight array to give you the weights corresponding to the diameters that are in your range.

meanweight1 = mean(weight(subarray1));

To do this using a loop for multiple ranges, you could do something like this:

diameter_ranges = [0.65 0.70; 
                   0.70 0.75;
                   0.75 0.80];

mean_diameters = zeros(shape(diameter_ranges, 1), 1);
mean_weights = zeros(shape(diameter_ranges, 1), 1);

for ii = 1:shape(diameter_ranges, 1)
    filter_selection = diameter >= diameter_ranges(ii, 0) & diameter < diameter_ranges(ii, 1);
    mean_diameters = mean(diameter(filter_selection));
    mean_weights = mean(weight(filter_selection));
end

plot(mean_diameters, mean_weights);
  • Related