Selecting the maximum matched pairs-CodePudding

I have two groups with different IDs, I got the possible matches by running a code that looked into cases that achieved the criteria, however, it returned for example for one ID from Group A, I have more than one match from Group B. I would like to get rid of the repetition and choose the matched pair randomly that achieved the maximum number of matched pairs at the end. Any Idea of how to solve this?

Here is my code:

SH = readtable('contol_parameters.xlsx','Sheet','m');
%% check if crieria met 
numElementsX = length(rmmissing(SH.Ages1));
numElementsY = length(rmmissing(SH.Ages2));
U1 = [];
U2=  [];
 for r=1:numElementsX
    for s=1:numElementsY
        if (abs(rmmissing(SH.Ages1(r))-rmmissing(SH.Ages2(s)))<=10) && (abs(rmmissing(SH.vol_1(r))-rmmissing(SH.vol_2(s)))<=10)
            U1(end 1)= SH.ID1(r);
            U2(end 1)= SH.ID2(s);
        end
    end
 end

%generated list 
 U_TS=[U1', U2'];

%results 

Group A Group B
216 217
216 221
216 222
216 234
216 256
216 262
216 266
216 330
216 390
225 217
225 222
225 234
225 239
225 256
225 257
225 260
225 263
225 266
225 277
225 302
225 324
225 330
225 333
225 341
225 359
225 381
225 386
225 390
225 423
225 435
225 436
225 442
225 466
225 470
225 478
227 257
227 260
227 263
227 277
227 302

CodePudding user response：

Here is a possible way to adjust the code to achieve your goal (I havent tested the code):

%% read data from Excel file
SH = readtable('contol_parameters.xlsx','Sheet','m');

%% get number of elements in Ages1 and Ages2
numElementsX = length(rmmissing(SH.Ages1));
numElementsY = length(rmmissing(SH.Ages2));

%% create empty arrays for IDs
U1 = [];
U2 = [];

%% iterate over IDs in Ages1 and Ages2
for r = 1:numElementsX
    for s = 1:numElementsY
        %% check if the difference in ages and volumes is within the allowed range
        if (abs(rmmissing(SH.Ages1(r))-rmmissing(SH.Ages2(s))) <= 10) && (abs(rmmissing(SH.vol_1(r))-rmmissing(SH.vol_2(s))) <= 10)
            %% if the criteria is met, add IDs to the arrays
            U1(end 1) = SH.ID1(r);
            U2(end 1) = SH.ID2(s);
        end
    end
end

%% combine the arrays of IDs into a single array
U_TS = [U1', U2'];

%% choose a random pair of IDs from the array
randomPair = U_TS(randperm(length(U_TS), 2));

%% find the pair of IDs that occurred the most number of times
mostFrequentPair = mode(U_TS);

The code above first reads the data from the Excel file, then iterates over the IDs in Ages1 and Ages2 and checks if the difference in ages and volumes is within the allowed range. If the criteria is met, the IDs are added to the U1 and U2 arrays. The code then combines the two arrays into a single array, U_TS, and chooses a random pair of IDs from this array. Finally, the code finds the pair of IDs that occurred the most number of times in U_TS using the mode function.

CodePudding user response：

If I understand your objective I would try the following:

uA = unique(A);
uB = unique(B);
iCnt = zeros(length(uA),length(uB);
for ii = 1:length(uA)
    for jj = 1:length(uB)
         iCnt(ii,jj) = sum((A==uA(ii) & B==uB(jj));
    end
end
[~,ind] = sort(sum(iCnt),'ascend');
uB = uB(ind);
iCnt = iCnt(:,ind);

%you now have a matrix (iCnt) where the least common members of groupB will be in the leftmost columns of iCnt and for each row (which represents the unique members of GroupA in vector uA) you can find the first non-zero column of iCnt to pick the least common member of GroupB. If that member of B has already been selected previously, you could go to the next non-zero column for another candidate