Home > front end >  How to use ifelse across multiple columns with alphanumeric range in R to create a new column?
How to use ifelse across multiple columns with alphanumeric range in R to create a new column?

Time:05-29

I have a dataset with over 200 columns and 2 million obs. In this data set the are 11 columns (all of them start with "DIAG") with alphanumeric categories and I want to create a new column that if the category I'm interested in is between that particular range, the column would receive 1, or else, 0.

I tried something like this, but it doesn't work:

data$GROUP_12 <- ifelse(across(contains('DIAG') >= "A15" & <= "A19", 
                 ifelse(across(contains('DIAG') >= "A50" & <= "A64", 1, 0))))

The ranges I'm interest in are: A15-A19, A50-A64, A80-B09, B15-B19, B25-B34. So I would have to do this for each range, but all of them being added to GROUP_12 column.

Edit with reproductible data:

data <- data.frame(DIAG_PRINC = c("A25", "B32", "O90"),
                   DIAG_SECUN = c("A16", "Y55", "K97"),
                   DIAGSEC1 = c("B15", "J55", "Y97"),
                   DIAGSEC2 = c("L16", "B55", "A97"),
                   DIAGSEC3 = c("W10", "S57", "T44"),
                   DIAGSEC4 = c("A64", "A16", "Y55"),
                   DIAGSEC5 = c("A80", "D33", "V12"),
                   DIAGSEC6 = c("M16", "N55", "X97"),
                   DIAGSEC7 = c("B16", "Y57", "O58"),
                   DIAGSEC8 = c("V45", "C23", "Q97"),
                   DIAGSEC9 = c("F98", "R44", "A54"))

Any suggestions?

CodePudding user response:

Based on the example provided and the logic mentioned, we may use if_any (if any of the columns in a particular row have the value) or if_all (if all the columns in the selected columns in a particular row have the value) to return TRUE/FALSE. Then wrap with or as.integer to coerce the logical to binary

library(dplyr)
vals_range <- c(sprintf("Ad", c(15:19, 50:64, 80)),
     sprintf("Bd", c(1:9, 15:19, 25:34)))
data <- data %>% 
  mutate(GROUP_12 =  (if_any(starts_with('DIAG'), ~ .x  %in% vals_range)))

-output

data
DIAG_PRINC DIAG_SECUN DIAGSEC1 DIAGSEC2 DIAGSEC3 DIAGSEC4 DIAGSEC5 DIAGSEC6 DIAGSEC7 DIAGSEC8 DIAGSEC9 GROUP_12
1        A25        A16      B15      L16      W10      A64      A80      M16      B16      V45      F98        1
2        B32        Y55      J55      B55      S57      A16      D33      N55      Y57      C23      R44        1
3        O90        K97      Y97      A97      T44      Y55      V12      X97      O58      Q97      A54        1
  • Related