Identifying MULTIPLE minimum column names-CodePudding

I want to create a column that contains the name of the column(s) with the minimum values. So far, I have only come across code that gives you a single name. In the case of ties, you can choose a tie break method. However, what I want to do is to list all column names in the case of a tie.

Minimum code:

df <- data.frame(ColA = c(0, 4, 7, 7, 3),
             ColB = c(0, 2, 5, 3, 2),
             ColC = c(5, 10, 1, 3, 1),
             ColD = c(7, 3, 1, 3, 0))

To get a single column name I can do:

df$all.min.groups <- names(df)[apply(df, MARGIN = 1, FUN = which.min)]

Can you help me get all the column names that have the minimum value by row?

CodePudding user response：

If you want just a column of comma-separated strings, then perhaps

do.call(mapply, c(list(FUN = function(...) {
  dots <- unlist(list(...))
  toString(names(df[which(dots == min(dots))]))
}), df))
# [1] "ColA, ColB"       "ColB"             "ColC, ColD"       "ColB, ColC, ColD" "ColD"

That would be good if this is "just informative". If, however, you intend to use them programmatically (perhaps to index on columns again later), then you may prefer to keep this as a list-column, in which case remove toString and use Map instead of mapply.

do.call(Map, c(list(f = function(...) 
  dots <- unlist(list(...))
  names(df[which(dots == min(dots))])
}), df))
# [[1]]
# [1] "ColA" "ColB"
# [[2]]
# [1] "ColB"
# [[3]]
# [1] "ColC" "ColD"
# [[4]]
# [1] "ColB" "ColC" "ColD"
# [[5]]
# [1] "ColD"

Note that this relies on numeric tests of quality, which should be fine as long as the numbers are not highly precise such that the technical limitations of digital storage of floating-point numbers become apparent, which seems unlikely given your sample data. However, you might want to reference Why are these numbers not equal?, Is floating point math broken?, and https://en.wikipedia.org/wiki/IEEE_754 anyway.

CodePudding user response：

Get the parallel minimum across each row, use this to generate the row/col locations of all minimums, and then grab the column names and split by row:

mins <- which(df == do.call(pmin, df), arr.ind=TRUE)
split(names(df)[mins[,"col"]], mins[,"row"])

#$`1`
#[1] "ColA" "ColB"
#
#$`2`
#[1] "ColB"
#
#$`3`
#[1] "ColC" "ColD"
#
#$`4`
#[1] "ColB" "ColC" "ColD"
#
#$`5`
#[1] "ColD"