I have a DF of logical vectors as follows:
DF <- data.frame(c(T,T,F), c(T,F,T), c(F,T,F))
I want to find row-column pairs under the condition that the combination has a TRUE value.
So, DF[1,2]
represents a possible pair, but DF[2,2]
does not.
Once in pair, the row and the column are excluded to make new pairs.
Depending on the data-set, there with be different pairing possibilities. It may also be impossible to find a pair for all the rows or columns.
My question is: What kind of algorithm/library can I use to maximize the quantity of pairs?
In the example given, the pairing solution would be this one:
DF[3,2]
DF[2,3]
DF[1,1]
CodePudding user response:
Not exactly sure about your logic, but maybe you want
which(as.matrix(DF), arr.ind=TRUE)
# row col
# [1,] 1 1
# [2,] 2 1
# [3,] 1 2
# [4,] 3 2
# [5,] 2 3
CodePudding user response:
This is the maximum matching problem of a bipartite graph formed by the rows and columns connected according to the matrix DF
. There are a few packages you can use for this problem. One option is the RcppHungarian
package, with HungarianSolver
.
DF <- data.frame(c(T,T,F), c(T,F,T), c(F,T,F))
RcppHungarian::HungarianSolver(!DF)$pairs
#> [,1] [,2]
#> [1,] 1 1
#> [2,] 2 3
#> [3,] 3 2