Home > Software design >  How to filter alphanumeric characters range?
How to filter alphanumeric characters range?

Time:01-30

I need to create dummy variables using ICD-10 codes. For example, chapter 2 starts with C00 and ends with D48X. Data looks like this:

data <- data.frame(LINHAA1 = c("B342", "C000", "D450", "0985"),
                   LINHAA2 = c("U071", "C99", "D68X", "J061"),
                   LINHAA3 = c("D48X", "Y098", "X223", "D640"))

Then I need to create a column that receives 1 if it's between the C00-D48X range and 0 if it's not. The result I desire:

LINHAA1   LINHAA2   LINHAA3  CHAPTER2
B342      U071      D48X         1
C000      C99       Y098         1
D450      D68X      X223         1
O985      J061      D640         0

It needs to go through LINHAA1 to LINHAA3. Thanks in advance!

CodePudding user response:

This should do it:

as.numeric(apply(apply(data, 1, 
    function(x) { x >="C00" & x <= "D48X" }), 2, any))
[1] 1 1 1 0

A little explanation: Checking if the codes are in the range can just be checked using alphabetic order (which you can get from <= etc). The inner apply checks each element and produces a matrix of logical values. The outer apply uses any to check if any one of the three logical values is true. as.numeric changes the result from TRUE/False to 1/0.

CodePudding user response:

This is the typical case for dplyr::if_any. if_any returns TRUE if a given condition is met in any of the tested columns, rowwise:

library(dplyr)

data %>%
    mutate(CHAPTER2 =  if_any(starts_with("LINHAA"),
                             ~.x >= 'C00' & .x <='D48X'))

  LINHAA1 LINHAA2 LINHAA3 CHAPTER2
1    B342    U071    D48X        1
2    C000     C99    Y098        1
3    D450    D68X    X223        1
4    0985    J061    D640        0

CodePudding user response:

Using dedicated icd package

# remotes::install_github("jackwasey/icd")
library(icd)

#get the 2nd chapter start and end codes
ch2 <- icd::icd10_chapters[[ 2 ]]
# start   end 
# "C00" "D49" 

#expland the codes to include all chapter2 codes
ch2codes <- expand_range(ch2[ "start" ], ch2[ "end" ])
# length(ch2codes)
# 2094

#check if codes in a row match
ix <- apply(data, 1, function(i) any(i %in% ch2codes))
# [1] FALSE  TRUE FALSE FALSE

data$chapter2 <- as.integer(ix)
#data
#   LINHAA1 LINHAA2 LINHAA3 chapter2
# 1    B342    U071    D48X        0
# 2    C000     C99    Y098        1
# 3    D450    D68X    X223        0
# 4    0985    J061    D640        0

Note that you have some invalid codes:

#invalid
is_defined("D48X")
# [1] FALSE
explain_code("D48X")
# character(0)

#Valid
is_defined("D48")
# [1] TRUE
explain_code("D48")
# [1] "Neoplasm of uncertain behavior of other and unspecified sites"
  •  Tags:  
  • ricd
  • Related