In the following dataframe df
,
structure(list(Name = c("Gregory", "Jane", "Joey", "Mark", "Rachel", "Phoebe", "Liza"), code = c("xx11-9090", "1367-88uu", "117y-xxxh", "cf56-gh67", "1888-ddf5", "rf52-628u", "hj69-5kk5"), `CLASS IF5` = c("E", "C", "C", "D", "D", "A", "A"), `CLASS AIS` = c("E",
"C", "C", "D", "D", "A", "A"), `CLASS IPP` = c("C", "C", "C",
"E", "E", "B", "A"), `CLASS SJR` = c("D", "C", "C", "D", "D",
"B", "A")), row.names = c(1682L, 1683L, 1768L, 333L, 443L, 510L,
897L), class = "data.frame")
the letters denote a ranking. For example: A is the first position, B is the second and so on. The letters range between A and E. I would like to collapse the columns that begin with CLASS
(i.e., the last four columns of the dataframe) in only one column keeping, for each row of the dataframe, only the letter that corresponds to the highest position in the ranking.
The desired result is:
Name code new column
1682 Gregory xx11-9090 C
1683 Jane 1367-88uu C
1768 Joey 117y-xxxh C
333 Mark cf56-gh67 D
443 Rachel 1888-ddf5 D
510 Phoebe rf52-628u A
897 Liza hj69-5kk5 A
CodePudding user response:
You can use the apply
statement to apply the min function to each row and then assign its output to a new column:
df$new_column <- apply(df[, grep("^CLASS", names(df))], 1, min, na.rm = TRUE)
CodePudding user response:
A possible solution in base R:
df$new_coolumn <- apply(df, 1, \(x) sort(x[-(1:2)])[1])
df[,c(1,2,7)]
#> Name code new_coolumn
#> 1682 Gregory xx11-9090 C
#> 1683 Jane 1367-88uu C
#> 1768 Joey 117y-xxxh C
#> 333 Mark cf56-gh67 D
#> 443 Rachel 1888-ddf5 D
#> 510 Phoebe rf52-628u A
#> 897 Liza hj69-5kk5 A
Using dplyr
:
library(dplyr)
df %>%
rowwise %>%
mutate(new_column = c_across(starts_with("CLASS")) %>% sort %>% .[1]) %>%
select(Name, code, new_column) %>% ungroup
#> # A tibble: 7 × 3
#> Name code new_column
#> <chr> <chr> <chr>
#> 1 Gregory xx11-9090 C
#> 2 Jane 1367-88uu C
#> 3 Joey 117y-xxxh C
#> 4 Mark cf56-gh67 D
#> 5 Rachel 1888-ddf5 D
#> 6 Phoebe rf52-628u A
#> 7 Liza hj69-5kk5 A