Home > database >  Take the letter that comes first in the alphabet (in R)
Take the letter that comes first in the alphabet (in R)

Time:03-26

In the following dataframe df,

structure(list(Name = c("Gregory", "Jane", "Joey", "Mark", "Rachel", "Phoebe", "Liza"), code = c("xx11-9090", "1367-88uu", "117y-xxxh", "cf56-gh67", "1888-ddf5", "rf52-628u", "hj69-5kk5"), `CLASS IF5` = c("E", "C", "C", "D", "D", "A", "A"), `CLASS AIS` = c("E", 
"C", "C", "D", "D", "A", "A"), `CLASS IPP` = c("C", "C", "C", 
"E", "E", "B", "A"), `CLASS SJR` = c("D", "C", "C", "D", "D", 
"B", "A")), row.names = c(1682L, 1683L, 1768L, 333L, 443L, 510L, 
897L), class = "data.frame")

the letters denote a ranking. For example: A is the first position, B is the second and so on. The letters range between A and E. I would like to collapse the columns that begin with CLASS (i.e., the last four columns of the dataframe) in only one column keeping, for each row of the dataframe, only the letter that corresponds to the highest position in the ranking.

The desired result is:

        Name      code new column 
1682 Gregory xx11-9090         C
1683    Jane 1367-88uu         C
1768    Joey 117y-xxxh         C
333     Mark cf56-gh67         D
443   Rachel 1888-ddf5         D
510   Phoebe rf52-628u         A
897     Liza hj69-5kk5         A

CodePudding user response:

You can use the apply statement to apply the min function to each row and then assign its output to a new column:

df$new_column <- apply(df[, grep("^CLASS", names(df))], 1, min, na.rm = TRUE)

CodePudding user response:

A possible solution in base R:

df$new_coolumn <- apply(df, 1, \(x) sort(x[-(1:2)])[1])
df[,c(1,2,7)]

#>         Name      code new_coolumn
#> 1682 Gregory xx11-9090           C
#> 1683    Jane 1367-88uu           C
#> 1768    Joey 117y-xxxh           C
#> 333     Mark cf56-gh67           D
#> 443   Rachel 1888-ddf5           D
#> 510   Phoebe rf52-628u           A
#> 897     Liza hj69-5kk5           A

Using dplyr:

library(dplyr)

df %>% 
  rowwise %>% 
  mutate(new_column = c_across(starts_with("CLASS")) %>% sort %>% .[1]) %>% 
  select(Name, code, new_column) %>% ungroup

#> # A tibble: 7 × 3
#>   Name    code      new_column
#>   <chr>   <chr>     <chr>     
#> 1 Gregory xx11-9090 C         
#> 2 Jane    1367-88uu C         
#> 3 Joey    117y-xxxh C         
#> 4 Mark    cf56-gh67 D         
#> 5 Rachel  1888-ddf5 D         
#> 6 Phoebe  rf52-628u A         
#> 7 Liza    hj69-5kk5 A
  • Related