Mutate if any column contains a list of values-CodePudding

I'm working with ICD codes and need your help trying to mutate an additional column "neuro" based on neurology-related ICD conditions. Here's an example dataset I'm working with:

  ID  `ICD9 1`  `ICD9 2`  `ICD9 3` `ICD9 4` `ICD9 5` `ICD9 6` `ICD9 7` `ICD9 8` `ICD9 9`
  <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>   
1 20002038 927      NA       NA       NA       NA       NA       NA       NA       NA      
2 20003011 460      NA       NA       NA       NA       NA       NA       NA       NA      
3 20003019 320      V22      473      V22      V22      724      NA       NA       NA      
4 20003026 719      490      729      724      NA       NA       NA       NA       NA      
5 20004018 724      401      436      287      780      NA       NA       NA       NA      
6 20007016 523      339      NA       NA       NA       NA       NA       NA       NA

How would I: (a) check if any ICD9 columns contain the following ICD codes of interest:

ICD = c(320:337, 339:359 and 430:438)

(b) then append an additional column "neuro" based on the rows containing the ICD code of interest.

I've tried the following solutions which many errors. The first method is most promising, but is returning "0" for some reason:

for(i in 2:ncol(df)){
  x = c(320:337, 339:359 and 430:438)
  test <- test %>% 
    mutate(neuro = ifelse(i %in% x, 1, 0) )
}

I also tried this to much less success:

x = c(320:337, 339:359 and 430:438)
df <- df %>% 
  mutate(neuro = ifelse(apply(df == x, 1, any), 1, 0))

I'm probably making many, many mistakes and it's been frustrating trying to figure this out for several hours. Would appreciate your help - thanks!

CodePudding user response：

we may need if_any

library(dplyr)
ICD <- c(320:337, 339:359, 430:438)
df <- df %>%
     mutate(neuro =  (if_any(starts_with("ICD"),  ~. %in% ICD)))

-output

df
        ID ICD 1 ICD 2 ICD 3 ICD 4 ICD 5 ICD 6 ICD 7 ICD 8 ICD 9 neuro
1 20002038   927  <NA>    NA  <NA>  <NA>    NA    NA    NA    NA     0
2 20003011   460  <NA>    NA  <NA>  <NA>    NA    NA    NA    NA     0
3 20003019   320   V22   473   V22   V22   724    NA    NA    NA     1
4 20003026   719   490   729   724  <NA>    NA    NA    NA    NA     0
5 20004018   724   401   436   287   780    NA    NA    NA    NA     1
6 20007016   523   339    NA  <NA>  <NA>    NA    NA    NA    NA     1

When the vector length is greater than 1, == wouldn't work as it is elementwise, we may need %in% and that should loop across the columns as %in% need a vector as input (df == x or df %in% x will not work)

data

df <- structure(list(ID = c(20002038L, 20003011L, 20003019L, 20003026L, 
20004018L, 20007016L), `ICD 1` = c(927L, 460L, 320L, 719L, 724L, 
523L), `ICD 2` = c(NA, NA, "V22", "490", "401", "339"), `ICD 3` = c(NA, 
NA, 473L, 729L, 436L, NA), `ICD 4` = c(NA, NA, "V22", "724", 
"287", NA), `ICD 5` = c(NA, NA, "V22", NA, "780", NA), `ICD 6` = c(NA, 
NA, 724L, NA, NA, NA), `ICD 7` = c(NA, NA, NA, NA, NA, NA), `ICD 8` = c(NA, 
NA, NA, NA, NA, NA), `ICD 9` = c(NA, NA, NA, NA, NA, NA)), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

CodePudding user response：

Here is an alternative approach. Best is that with if_any as provided from akrun!

library(tidyverse)
df %>% 
  mutate(across(-ID, ~ifelse(. %in% ICD, 1,0), .names = 'new_{col}')) %>% 
  unite(New_Col, starts_with('new'), na.rm = TRUE, sep = ' ') %>% 
  mutate(neuro = str_extract(New_Col, "1"), .keep="unused") %>%
  mutate(neuro = replace_na(neuro, 0))

        ID ICD 1 ICD 2 ICD 3 ICD 4 ICD 5 ICD 6 ICD 7 ICD 8 ICD 9 neuro
1 20002038   927  <NA>    NA  <NA>  <NA>    NA    NA    NA    NA     0
2 20003011   460  <NA>    NA  <NA>  <NA>    NA    NA    NA    NA     0
3 20003019   320   V22   473   V22   V22   724    NA    NA    NA     1
4 20003026   719   490   729   724  <NA>    NA    NA    NA    NA     0
5 20004018   724   401   436   287   780    NA    NA    NA    NA     1
6 20007016   523   339    NA  <NA>  <NA>    NA    NA    NA    NA     1

CodePudding user response：

Perhaps we can try

df$neuro <-  (rowSums(matrix(as.matrix(df[-1]) %in% ICD, nrow = nrow(df))) > 0)

such that

> df
        ID ICD 1 ICD 2 ICD 3 ICD 4 ICD 5 ICD 6 ICD 7 ICD 8 ICD 9 neuro
1 20002038   927  <NA>    NA  <NA>  <NA>    NA    NA    NA    NA     0
2 20003011   460  <NA>    NA  <NA>  <NA>    NA    NA    NA    NA     0
3 20003019   320   V22   473   V22   V22   724    NA    NA    NA     1
4 20003026   719   490   729   724  <NA>    NA    NA    NA    NA     0
5 20004018   724   401   436   287   780    NA    NA    NA    NA     1
6 20007016   523   339    NA  <NA>  <NA>    NA    NA    NA    NA     1