Home > Enterprise >  Group elements in a named vector by character pattern and find max per group
Group elements in a named vector by character pattern and find max per group

Time:07-01

Suppose I have the following vector

test <- c("x1" = 0.1, "x2" = 0.3, "x3" = 0.4,
          "y1" = 0.1, "y2" = 0.5, "y3" = 0.4,
          "z1" = 0.5, "z2" = 0.3, "z3" = 0.4)

test
#  x1  x2  x3  y1  y2  y3  z1  z2  z3 
# 0.1 0.3 0.4 0.1 0.5 0.4 0.5 0.3 0.4

I want to find the vector element with the highest value, grouped per letter. So in this case, I want the output to be "x3", "y2", "z1". The tricky thing is that I do not know in advance how many different letter groups there will be, nor how many numbers there will be per letter. Hence, I would need a simple yet flexible code that does not need a pre-specified grouping.

Any suggestions on which functions to use?

CodePudding user response:

Here is my solution with a verbose walk-through.

## can also use `grp <- stringr::str_remove(names(test), "[0-9] ")`
grp <- stringr::str_extract(names(test), "[A-Za-z] ")
#[1] "x" "x" "x" "y" "y" "y" "z" "z" "z"

## split vector by group
lst <- unname(split(test, grp))
#[[1]]
# x1  x2  x3 
#0.1 0.3 0.4 
#
#[[2]]
# y1  y2  y3 
#0.1 0.5 0.4
#
#[[3]]
# z1  z2  z3 
#0.5 0.3 0.4 

## since you want to keep the names "x3", "y2", "z1"
## it is not satisfactory to simply do `sapply(lst, max)`
sapply(lst, function (x) x[which.max(x)])
# x3  y2  z1 
#0.4 0.5 0.5 

The code is robust enough to handle the following more complicated case.

hard <- c("x3" = 0.1, "x2" = 0.3, "x1" = 0.4,
          "Yy1" = 0.1, "Yy2" = 0.5, "Yy3" = 0.4,
          "z0" = 0.5, "z1" = 0.3, "z2" = 0.4)
# x3  x2  x1 Yy1 Yy2 Yy3  z0  z1  z2 
#0.1 0.3 0.4 0.1 0.5 0.4 0.5 0.3 0.4

grp <- stringr::str_extract(names(hard), "[A-Za-z] ")
lst <- unname(split(hard, grp))
sapply(lst, function (x) x[which.max(x)])
# x1 Yy2  z0 
#0.4 0.5 0.5 
  • Related