I am using R and this is how my data looks like,
a <- data.frame(id=c(1,2,2,2,3),icd9=c("0781","00840","8660","7100","25011"))
I want to place a dot after the 3rd digit in both four digits and five-digit numbers in second column. I am using gsub
in R but not getting desired output. My desired data frame is:
id icd9
1 078.1
2 008.40
2 866.0
2 710.0
3 250.11
I am trying
gsub('([0-9])', '\\1\\2\\3.\\4', a$icd9)
But I am getting
[1] "0.7.8.1." "0.0.8.4.0." "8.6.6.0." "7.1.0.0." "2.5.0.1.1."
Thanks, guys in advance :)
CodePudding user response:
library(dplyr)
a %>%
mutate(num = as.numeric(paste0(substr(icd9,1,3),".",substr(icd9,4,nchar(icd9)))))
id icd9 num
1 1 0781 78.10
2 2 00840 8.40
3 2 8660 866.00
4 2 7100 710.00
5 3 25011 250.11
CodePudding user response:
If your goal is to map ICD9 codes to phecodes, please include that info in your question. This approach may be useful to you:
library(tidyverse)
#remotes::install_github("PheWAS/PheWAS")
library(PheWAS)
#> Loading required package: parallel
#install.packages("fuzzyjoin")
library(fuzzyjoin)
a <- data.frame(id=c(1,2,2,2,3),icd9=c("0781","00840","8660","7100","25011"))
ci_str_detect <- function(x, y) {
str_detect(y, pattern = sub('(?<=.{3})', '.', x, perl = TRUE))
}
fuzzyjoin::fuzzy_left_join(a, phecode_map, by = c("icd9" = "code"), match_fun = ci_str_detect)
#> id icd9 vocabulary_id code phecode
#> 1 1 0781 ICD9CM 078.1 078
#> 2 1 0781 ICD9CM 078.10 078
#> 3 1 0781 ICD9CM 078.11 078
#> 4 1 0781 ICD9CM 078.12 078
#> 5 1 0781 ICD9CM 078.19 078
#> 6 2 00840 <NA> <NA> <NA>
#> 7 2 8660 ICD9CM E866.0 984
#> 8 2 7100 ICD9CM 710.0 695.42
#> 9 3 25011 ICD9CM 250.11 250.11
Created on 2021-09-21 by the reprex package (v2.0.1)
Edit
"008.40" doesn't appear to be a valid ICD9 code. "008.41" is valid though, so if you use that instead you don't get the "NA" values in line 6.