Suppose I have the following vector
test <- c("x1" = 0.1, "x2" = 0.3, "x3" = 0.4,
"y1" = 0.1, "y2" = 0.5, "y3" = 0.4,
"z1" = 0.5, "z2" = 0.3, "z3" = 0.4)
test
# x1 x2 x3 y1 y2 y3 z1 z2 z3
# 0.1 0.3 0.4 0.1 0.5 0.4 0.5 0.3 0.4
I want to find the vector element with the highest value, grouped per letter. So in this case, I want the output to be "x3", "y2", "z1". The tricky thing is that I do not know in advance how many different letter groups there will be, nor how many numbers there will be per letter. Hence, I would need a simple yet flexible code that does not need a pre-specified grouping.
Any suggestions on which functions to use?
CodePudding user response:
Here is my solution with a verbose walk-through.
## can also use `grp <- stringr::str_remove(names(test), "[0-9] ")`
grp <- stringr::str_extract(names(test), "[A-Za-z] ")
#[1] "x" "x" "x" "y" "y" "y" "z" "z" "z"
## split vector by group
lst <- unname(split(test, grp))
#[[1]]
# x1 x2 x3
#0.1 0.3 0.4
#
#[[2]]
# y1 y2 y3
#0.1 0.5 0.4
#
#[[3]]
# z1 z2 z3
#0.5 0.3 0.4
## since you want to keep the names "x3", "y2", "z1"
## it is not satisfactory to simply do `sapply(lst, max)`
sapply(lst, function (x) x[which.max(x)])
# x3 y2 z1
#0.4 0.5 0.5
The code is robust enough to handle the following more complicated case.
hard <- c("x3" = 0.1, "x2" = 0.3, "x1" = 0.4,
"Yy1" = 0.1, "Yy2" = 0.5, "Yy3" = 0.4,
"z0" = 0.5, "z1" = 0.3, "z2" = 0.4)
# x3 x2 x1 Yy1 Yy2 Yy3 z0 z1 z2
#0.1 0.3 0.4 0.1 0.5 0.4 0.5 0.3 0.4
grp <- stringr::str_extract(names(hard), "[A-Za-z] ")
lst <- unname(split(hard, grp))
sapply(lst, function (x) x[which.max(x)])
# x1 Yy2 z0
#0.4 0.5 0.5