Home > other >  Extract numbers from a character vector and adding leading zeros
Extract numbers from a character vector and adding leading zeros

Time:01-24

I have a character-vector with the following structure:

GDM3
PER.1.1.1_1
PER.1.10.2_1
PER.1.1.32_1
PER.1.1.4_1
PER.1.1.5_1
PER.11.29.1_1
PER.1.2.2_1
PER.31.2.3_1
PER.1.2.44_1
PER.5.2.25_1

I want to extract the three numbers in the middle of middle of that ID and add leading numbers if they are only single digits. The finale vector can be a character vector again. In the end the result should look like this:

GDM3
010101
011002
010132
010104
010105
112901
010202
310203
010244
050225

CodePudding user response:

tmp <- strcapture("\\.([0-9] )\\.([0-9] )\\.([0-9] )_", X$GDM3, 
                  proto = list(a=0L, b=0L, c=0L)) |>
  lapply(sprintf, fmt = "i")
do.call(paste0, tmp)
#  [1] "010101" "011002" "010132" "010104" "010105" "112901" "010202" "310203" "010244" "050225"

Explanation:

  • strcapture extracts the known patterns into a data.frame, with names and classes defined in proto (the actual values in proto are not used);
  • lapply(sprintf, fmt="i") zero-pads to 2 digits all columns of the frame
  • do.call(paste, tmp) concatenates each row of the frame into a single string.

Data

X <- structure(list(GDM3 = c("PER.1.1.1_1", "PER.1.10.2_1", "PER.1.1.32_1", "PER.1.1.4_1", "PER.1.1.5_1", "PER.11.29.1_1", "PER.1.2.2_1", "PER.31.2.3_1", "PER.1.2.44_1", "PER.5.2.25_1")), class = "data.frame", row.names = c(NA, -10L))

CodePudding user response:

Assuming GDM3 shown in the Note at the end, read it creating a data frame and the use sprintf to create the result.

with( read.table(text = GDM3, sep = ".", comment.char = "_"), 
  sprintf("ddd", V2, V3, V4) )

giving:

 [1] "010101" "011002" "010132" "010104" "010105" "112901" "010202" "310203"
 [9] "010244" "050225"

Note

GDM3 <- c("PER.1.1.1_1", "PER.1.10.2_1", "PER.1.1.32_1", "PER.1.1.4_1", 
  "PER.1.1.5_1", "PER.11.29.1_1", "PER.1.2.2_1", "PER.31.2.3_1", 
  "PER.1.2.44_1", "PER.5.2.25_1")

CodePudding user response:

Another solution:

X <- structure(list(GDM3 = c("PER.1.1.1_1", "PER.1.10.2_1", "PER.1.1.32_1", "PER.1.1.4_1", "PER.1.1.5_1", "PER.11.29.1_1", "PER.1.2.2_1", "PER.31.2.3_1", "PER.1.2.44_1", "PER.5.2.25_1")), class = "data.frame", row.names = c(NA, -10L))
strsplit(X$GDM3, "\\.|_") |>
  sapply(function(x) paste0(sprintf("i", as.numeric(x[2:4])), collapse = ""))
#[1] "010101" "011002" "010132" "010104" "010105" "112901" "010202" "310203" "010244" "050225"
  • Related