I have a vector that includes the following types of data in R(more than just the two here):
df <- c("04 IRB/IEC and other Approvals\04.01 IRB/IEC Trial Approvals\04.01.02 IRB/IEC Approval",
"01 Trial Management\01.01 Trial Oversight\01.01.02 Trial Management Plan")
All observations have the same structure with two backslashes. I want to extract the 8 characters immediately following the last backslash (or the numerical values including the periods). Here is an example of what I would want in R (I've been trying to use stringr):
df2 <- c("04.01.02", "01.01.02")
If anyone is familiar with the DIA TMF reference model, I want the zone/section/artifact number from the DF.
Thank you!
CodePudding user response:
We may need
library(stringi)
library(stringr)
stri_extract_last_regex(str_replace_all(df, setNames(c(" 04", " 01"),
c("\004", "\001"))), "\\d{2}\\.\\d{2}\\.\\d{2}")
[1] "04.01.02" "01.01.02"
CodePudding user response:
Instead of splitting on the backslash, if you only want the numbers separated by periods, you could do something like:
stringr::str_extract(df, "\\d\\d\\.\\d\\d\\.\\d\\d")
#> [1] "04.01.02" "01.01.02"
Data used
df <- c("04 IRB/IEC and other Approvals\\04.01 IRB/IEC Trial Approvals\\04.01.02 IRB/IEC Approval",
"01 Trial Management\01.01 Trial Oversight\\01.01.02 Trial Management Plan")