I'm working with ICD codes and need your help trying to mutate an additional column "neuro" based on neurology-related ICD conditions. Here's an example dataset I'm working with:
ID `ICD9 1` `ICD9 2` `ICD9 3` `ICD9 4` `ICD9 5` `ICD9 6` `ICD9 7` `ICD9 8` `ICD9 9`
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 20002038 927 NA NA NA NA NA NA NA NA
2 20003011 460 NA NA NA NA NA NA NA NA
3 20003019 320 V22 473 V22 V22 724 NA NA NA
4 20003026 719 490 729 724 NA NA NA NA NA
5 20004018 724 401 436 287 780 NA NA NA NA
6 20007016 523 339 NA NA NA NA NA NA NA
How would I: (a) check if any ICD9 columns contain the following ICD codes of interest:
ICD = c(320:337, 339:359 and 430:438)
(b) then append an additional column "neuro" based on the rows containing the ICD code of interest.
I've tried the following solutions which many errors. The first method is most promising, but is returning "0" for some reason:
for(i in 2:ncol(df)){
x = c(320:337, 339:359 and 430:438)
test <- test %>%
mutate(neuro = ifelse(i %in% x, 1, 0) )
}
I also tried this to much less success:
x = c(320:337, 339:359 and 430:438)
df <- df %>%
mutate(neuro = ifelse(apply(df == x, 1, any), 1, 0))
I'm probably making many, many mistakes and it's been frustrating trying to figure this out for several hours. Would appreciate your help - thanks!
CodePudding user response:
we may need if_any
library(dplyr)
ICD <- c(320:337, 339:359, 430:438)
df <- df %>%
mutate(neuro = (if_any(starts_with("ICD"), ~. %in% ICD)))
-output
df
ID ICD 1 ICD 2 ICD 3 ICD 4 ICD 5 ICD 6 ICD 7 ICD 8 ICD 9 neuro
1 20002038 927 <NA> NA <NA> <NA> NA NA NA NA 0
2 20003011 460 <NA> NA <NA> <NA> NA NA NA NA 0
3 20003019 320 V22 473 V22 V22 724 NA NA NA 1
4 20003026 719 490 729 724 <NA> NA NA NA NA 0
5 20004018 724 401 436 287 780 NA NA NA NA 1
6 20007016 523 339 NA <NA> <NA> NA NA NA NA 1
When the vector length
is greater than 1, ==
wouldn't work as it is elementwise, we may need %in%
and that should loop across
the columns as %in%
need a vector as input (df == x
or df %in% x
will not work)
data
df <- structure(list(ID = c(20002038L, 20003011L, 20003019L, 20003026L,
20004018L, 20007016L), `ICD 1` = c(927L, 460L, 320L, 719L, 724L,
523L), `ICD 2` = c(NA, NA, "V22", "490", "401", "339"), `ICD 3` = c(NA,
NA, 473L, 729L, 436L, NA), `ICD 4` = c(NA, NA, "V22", "724",
"287", NA), `ICD 5` = c(NA, NA, "V22", NA, "780", NA), `ICD 6` = c(NA,
NA, 724L, NA, NA, NA), `ICD 7` = c(NA, NA, NA, NA, NA, NA), `ICD 8` = c(NA,
NA, NA, NA, NA, NA), `ICD 9` = c(NA, NA, NA, NA, NA, NA)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
CodePudding user response:
Here is an alternative approach. Best is that with if_any
as provided from akrun!
library(tidyverse)
df %>%
mutate(across(-ID, ~ifelse(. %in% ICD, 1,0), .names = 'new_{col}')) %>%
unite(New_Col, starts_with('new'), na.rm = TRUE, sep = ' ') %>%
mutate(neuro = str_extract(New_Col, "1"), .keep="unused") %>%
mutate(neuro = replace_na(neuro, 0))
ID ICD 1 ICD 2 ICD 3 ICD 4 ICD 5 ICD 6 ICD 7 ICD 8 ICD 9 neuro
1 20002038 927 <NA> NA <NA> <NA> NA NA NA NA 0
2 20003011 460 <NA> NA <NA> <NA> NA NA NA NA 0
3 20003019 320 V22 473 V22 V22 724 NA NA NA 1
4 20003026 719 490 729 724 <NA> NA NA NA NA 0
5 20004018 724 401 436 287 780 NA NA NA NA 1
6 20007016 523 339 NA <NA> <NA> NA NA NA NA 1
CodePudding user response:
Perhaps we can try
df$neuro <- (rowSums(matrix(as.matrix(df[-1]) %in% ICD, nrow = nrow(df))) > 0)
such that
> df
ID ICD 1 ICD 2 ICD 3 ICD 4 ICD 5 ICD 6 ICD 7 ICD 8 ICD 9 neuro
1 20002038 927 <NA> NA <NA> <NA> NA NA NA NA 0
2 20003011 460 <NA> NA <NA> <NA> NA NA NA NA 0
3 20003019 320 V22 473 V22 V22 724 NA NA NA 1
4 20003026 719 490 729 724 <NA> NA NA NA NA 0
5 20004018 724 401 436 287 780 NA NA NA NA 1
6 20007016 523 339 NA <NA> <NA> NA NA NA NA 1