Placing dots in the numbers of a dataframe-CodePudding

I am using R and this is how my data looks like,

 a <- data.frame(id=c(1,2,2,2,3),icd9=c("0781","00840","8660","7100","25011"))

I want to place a dot after the 3rd digit in both four digits and five-digit numbers in second column. I am using gsub in R but not getting desired output. My desired data frame is:

id   icd9
1    078.1
2    008.40
2    866.0
2    710.0
3    250.11

I am trying

gsub('([0-9])', '\\1\\2\\3.\\4', a$icd9)

But I am getting

[1] "0.7.8.1."   "0.0.8.4.0." "8.6.6.0."   "7.1.0.0."   "2.5.0.1.1."

Thanks, guys in advance :)

CodePudding user response：

library(dplyr)
a %>% 
  mutate(num = as.numeric(paste0(substr(icd9,1,3),".",substr(icd9,4,nchar(icd9)))))

  id  icd9    num
1  1  0781  78.10
2  2 00840   8.40
3  2  8660 866.00
4  2  7100 710.00
5  3 25011 250.11

CodePudding user response：

If your goal is to map ICD9 codes to phecodes, please include that info in your question. This approach may be useful to you:

library(tidyverse)
#remotes::install_github("PheWAS/PheWAS")
library(PheWAS)
#> Loading required package: parallel
#install.packages("fuzzyjoin")
library(fuzzyjoin)

a <- data.frame(id=c(1,2,2,2,3),icd9=c("0781","00840","8660","7100","25011"))

ci_str_detect <- function(x, y) {
  str_detect(y, pattern = sub('(?<=.{3})', '.', x, perl = TRUE))
}

fuzzyjoin::fuzzy_left_join(a, phecode_map, by = c("icd9" = "code"), match_fun = ci_str_detect)
#>   id  icd9 vocabulary_id   code phecode
#> 1  1  0781        ICD9CM  078.1     078
#> 2  1  0781        ICD9CM 078.10     078
#> 3  1  0781        ICD9CM 078.11     078
#> 4  1  0781        ICD9CM 078.12     078
#> 5  1  0781        ICD9CM 078.19     078
#> 6  2 00840          <NA>   <NA>    <NA>
#> 7  2  8660        ICD9CM E866.0     984
#> 8  2  7100        ICD9CM  710.0  695.42
#> 9  3 25011        ICD9CM 250.11  250.11

^{Created on 2021-09-21 by the reprex package (v2.0.1)}

Edit

"008.40" doesn't appear to be a valid ICD9 code. "008.41" is valid though, so if you use that instead you don't get the "NA" values in line 6.