Home > Blockchain >  Expand ICD10 codes from one row into multiple rows
Expand ICD10 codes from one row into multiple rows

Time:11-01

I have a dataset like this:

ICD_10 diagnosis
A00 Cholera
A01-A03 Other Intestinal infectious diseases
A15 Respiratory tuberculosis
A17-A19 Other tuberculosis

...

On row 2 and 4, there are multiple ICD-10 codes, and I want to expand them into multiple rows, like below:

ICD_10 diagnosis
A00 Cholera
A01 Other Intestinal infectious diseases
A02 Other Intestinal infectious diseases
A03 Other Intestinal infectious diseases
A15 Respiratory tuberculosis
A17 Other tuberculosis
A18 Other tuberculosis
A19 Other tuberculosis

How can I accomplish this in R using tidyverse?

Thanks for your help!

CodePudding user response:

fun <- function(vec) {
  ltr <- substring(vec, 1, 1)
  L <- lapply(strsplit(gsub("[^-0-9]", "", vec), "-"), as.integer)
  mapply(function(ltr, z) sprintf("%si", ltr, if (length(z) > 1) seq(z[1], z[2]) else z),
         ltr, L)
}
quux %>%
  mutate(ICD_10 = fun(ICD_10)) %>%
  tidyr::unnest(ICD_10)
# # A tibble: 8 x 2
#   ICD_10 diagnosis                           
#   <chr>  <chr>                               
# 1 A00    Cholera                             
# 2 A01    Other Intestinal infectious diseases
# 3 A02    Other Intestinal infectious diseases
# 4 A03    Other Intestinal infectious diseases
# 5 A15    Respiratory tuberculosis            
# 6 A17    Other tuberculosis                  
# 7 A18    Other tuberculosis                  
# 8 A19    Other tuberculosis                  

Data

quux <- structure(list(ICD_10 = c("A00", "A01-A03", "A15", "A17-A19"), diagnosis = c("Cholera", "Other Intestinal infectious diseases", "Respiratory tuberculosis", "Other tuberculosis")), class = "data.frame", row.names = c(NA, -4L))

CodePudding user response:

Using dedicated icd package:

#data
d <- structure(list(ICD_10 = c("A00", "A01-A03", "A15", "A17-A19"), diagnosis = c("Cholera", "Other Intestinal infectious diseases", "Respiratory tuberculosis", "Other tuberculosis")), class = "data.frame", row.names = c(NA, -4L))

#remotes::install_github("jackwasey/icd")
library(icd)

To avoid creating non-existent or missing out existing codes between the ranges we use expand_ranges. For example, below returns 33 codes, instead of 3 if we filled in sequentially A01, A02, A03, which is wrong.

expand_range("A01", "A03")
#  [1] "A01"   "A010"  "A0100" "A0101" "A0102" "A0103" "A0104" "A0105"
#  [9] "A0109" "A011"  "A012"  "A013"  "A014"  "A02"   "A020"  "A021" 
# [17] "A022"  "A0220" "A0221" "A0222" "A0223" "A0224" "A0225" "A0229"
# [25] "A028"  "A029"  "A03"   "A030"  "A031"  "A032"  "A033"  "A038" 
# [33] "A039"

We also use explain_code, to give description for newly created codes, example usage:

explain_code("A01")
# [1] "Typhoid and paratyphoid fevers"

Now, wrap two functions into one, to get a pretty output

# custom function using expand_range
f <- function(icd10, diagnosis){
  x <- unlist(strsplit(icd10, "-"))
  
  if(length(x) == 1){ ICD10 = x 
  } else {ICD10 = expand_range(x[1], x[2])}
  
  data.frame(
    icd10 = icd10,
    diagnosis = diagnosis, 
    icd10range = ICD10,
    desc = explain_code(ICD10))
}

And loop through the codes to expand, then rowbind:

# loop through rows, and rowbind
res <- do.call(rbind, 
               mapply(f, d$ICD_10, d$diagnosis,
                      SIMPLIFY = FALSE, USE.NAMES = FALSE))

head(res)
#     icd10                            diagnosis icd10range                                 desc
# 1     A00                              Cholera        A00                              Cholera
# 2 A01-A03 Other Intestinal infectious diseases        A01       Typhoid and paratyphoid fevers
# 3 A01-A03 Other Intestinal infectious diseases       A010                        Typhoid fever
# 4 A01-A03 Other Intestinal infectious diseases      A0100           Typhoid fever, unspecified
# 5 A01-A03 Other Intestinal infectious diseases      A0101                   Typhoid meningitis
# 6 A01-A03 Other Intestinal infectious diseases      A0102 Typhoid fever with heart involvement

As expected A01-A03 now expanded into 33 rows:

table(res$icd10)
# A00 A01-A03     A15 A17-A19 
#   1      33       1      53 

CodePudding user response:

One option:

tibble::tribble(
  ~ICD_10, ~diagnosis,
  "A00", "Cholera",
  "A01-A03", "Other Intestinal infectious diseases",
  "A15", "Respiratory tuberculosis",
  "A17-A19", "Other tuberculosis"
) |> 
  tidyr::separate_rows(ICD_10, sep = "-") |> 
  mutate(id = parse_number(ICD_10)) |> 
  group_by(diagnosis) |> 
  complete(id = min(id):max(id)) |> 
  mutate(ICD_10 = paste0("A", id))
  • Related