I have a dataset like this:
ICD_10 | diagnosis |
---|---|
A00 | Cholera |
A01-A03 | Other Intestinal infectious diseases |
A15 | Respiratory tuberculosis |
A17-A19 | Other tuberculosis |
...
On row 2 and 4, there are multiple ICD-10 codes, and I want to expand them into multiple rows, like below:
ICD_10 | diagnosis |
---|---|
A00 | Cholera |
A01 | Other Intestinal infectious diseases |
A02 | Other Intestinal infectious diseases |
A03 | Other Intestinal infectious diseases |
A15 | Respiratory tuberculosis |
A17 | Other tuberculosis |
A18 | Other tuberculosis |
A19 | Other tuberculosis |
How can I accomplish this in R using tidyverse?
Thanks for your help!
CodePudding user response:
fun <- function(vec) {
ltr <- substring(vec, 1, 1)
L <- lapply(strsplit(gsub("[^-0-9]", "", vec), "-"), as.integer)
mapply(function(ltr, z) sprintf("%si", ltr, if (length(z) > 1) seq(z[1], z[2]) else z),
ltr, L)
}
quux %>%
mutate(ICD_10 = fun(ICD_10)) %>%
tidyr::unnest(ICD_10)
# # A tibble: 8 x 2
# ICD_10 diagnosis
# <chr> <chr>
# 1 A00 Cholera
# 2 A01 Other Intestinal infectious diseases
# 3 A02 Other Intestinal infectious diseases
# 4 A03 Other Intestinal infectious diseases
# 5 A15 Respiratory tuberculosis
# 6 A17 Other tuberculosis
# 7 A18 Other tuberculosis
# 8 A19 Other tuberculosis
Data
quux <- structure(list(ICD_10 = c("A00", "A01-A03", "A15", "A17-A19"), diagnosis = c("Cholera", "Other Intestinal infectious diseases", "Respiratory tuberculosis", "Other tuberculosis")), class = "data.frame", row.names = c(NA, -4L))
CodePudding user response:
Using dedicated icd package:
#data
d <- structure(list(ICD_10 = c("A00", "A01-A03", "A15", "A17-A19"), diagnosis = c("Cholera", "Other Intestinal infectious diseases", "Respiratory tuberculosis", "Other tuberculosis")), class = "data.frame", row.names = c(NA, -4L))
#remotes::install_github("jackwasey/icd")
library(icd)
To avoid creating non-existent or missing out existing codes between the ranges we use expand_ranges. For example, below returns 33 codes, instead of 3 if we filled in sequentially A01, A02, A03, which is wrong.
expand_range("A01", "A03")
# [1] "A01" "A010" "A0100" "A0101" "A0102" "A0103" "A0104" "A0105"
# [9] "A0109" "A011" "A012" "A013" "A014" "A02" "A020" "A021"
# [17] "A022" "A0220" "A0221" "A0222" "A0223" "A0224" "A0225" "A0229"
# [25] "A028" "A029" "A03" "A030" "A031" "A032" "A033" "A038"
# [33] "A039"
We also use explain_code, to give description for newly created codes, example usage:
explain_code("A01")
# [1] "Typhoid and paratyphoid fevers"
Now, wrap two functions into one, to get a pretty output
# custom function using expand_range
f <- function(icd10, diagnosis){
x <- unlist(strsplit(icd10, "-"))
if(length(x) == 1){ ICD10 = x
} else {ICD10 = expand_range(x[1], x[2])}
data.frame(
icd10 = icd10,
diagnosis = diagnosis,
icd10range = ICD10,
desc = explain_code(ICD10))
}
And loop through the codes to expand, then rowbind:
# loop through rows, and rowbind
res <- do.call(rbind,
mapply(f, d$ICD_10, d$diagnosis,
SIMPLIFY = FALSE, USE.NAMES = FALSE))
head(res)
# icd10 diagnosis icd10range desc
# 1 A00 Cholera A00 Cholera
# 2 A01-A03 Other Intestinal infectious diseases A01 Typhoid and paratyphoid fevers
# 3 A01-A03 Other Intestinal infectious diseases A010 Typhoid fever
# 4 A01-A03 Other Intestinal infectious diseases A0100 Typhoid fever, unspecified
# 5 A01-A03 Other Intestinal infectious diseases A0101 Typhoid meningitis
# 6 A01-A03 Other Intestinal infectious diseases A0102 Typhoid fever with heart involvement
As expected A01-A03 now expanded into 33 rows:
table(res$icd10)
# A00 A01-A03 A15 A17-A19
# 1 33 1 53
CodePudding user response:
One option:
tibble::tribble(
~ICD_10, ~diagnosis,
"A00", "Cholera",
"A01-A03", "Other Intestinal infectious diseases",
"A15", "Respiratory tuberculosis",
"A17-A19", "Other tuberculosis"
) |>
tidyr::separate_rows(ICD_10, sep = "-") |>
mutate(id = parse_number(ICD_10)) |>
group_by(diagnosis) |>
complete(id = min(id):max(id)) |>
mutate(ICD_10 = paste0("A", id))