I need to create dummy variables using ICD-10 codes. For example, chapter 2 starts with C00 and ends with D48X. Data looks like this:
data <- data.frame(LINHAA1 = c("B342", "C000", "D450", "0985"),
LINHAA2 = c("U071", "C99", "D68X", "J061"),
LINHAA3 = c("D48X", "Y098", "X223", "D640"))
Then I need to create a column that receives 1 if it's between the C00-D48X range and 0 if it's not. The result I desire:
LINHAA1 LINHAA2 LINHAA3 CHAPTER2
B342 U071 D48X 1
C000 C99 Y098 1
D450 D68X X223 1
O985 J061 D640 0
It needs to go through LINHAA1 to LINHAA3. Thanks in advance!
CodePudding user response:
This should do it:
as.numeric(apply(apply(data, 1,
function(x) { x >="C00" & x <= "D48X" }), 2, any))
[1] 1 1 1 0
A little explanation: Checking if the codes are in the range can just be checked using alphabetic order (which you can get from <= etc). The inner apply
checks each element and produces a matrix of logical values. The outer apply
uses any
to check if any one of the three logical values is true. as.numeric
changes the result from TRUE/False to 1/0.
CodePudding user response:
This is the typical case for dplyr::if_any
. if_any
returns TRUE if a given condition is met in any of the tested columns, rowwise:
library(dplyr)
data %>%
mutate(CHAPTER2 = if_any(starts_with("LINHAA"),
~.x >= 'C00' & .x <='D48X'))
LINHAA1 LINHAA2 LINHAA3 CHAPTER2
1 B342 U071 D48X 1
2 C000 C99 Y098 1
3 D450 D68X X223 1
4 0985 J061 D640 0
CodePudding user response:
Using dedicated icd package
# remotes::install_github("jackwasey/icd")
library(icd)
#get the 2nd chapter start and end codes
ch2 <- icd::icd10_chapters[[ 2 ]]
# start end
# "C00" "D49"
#expland the codes to include all chapter2 codes
ch2codes <- expand_range(ch2[ "start" ], ch2[ "end" ])
# length(ch2codes)
# 2094
#check if codes in a row match
ix <- apply(data, 1, function(i) any(i %in% ch2codes))
# [1] FALSE TRUE FALSE FALSE
data$chapter2 <- as.integer(ix)
#data
# LINHAA1 LINHAA2 LINHAA3 chapter2
# 1 B342 U071 D48X 0
# 2 C000 C99 Y098 1
# 3 D450 D68X X223 0
# 4 0985 J061 D640 0
Note that you have some invalid codes:
#invalid
is_defined("D48X")
# [1] FALSE
explain_code("D48X")
# character(0)
#Valid
is_defined("D48")
# [1] TRUE
explain_code("D48")
# [1] "Neoplasm of uncertain behavior of other and unspecified sites"