Extracting 8 characters after the last backslash in a string using R-CodePudding

I have a vector that includes the following types of data in R(more than just the two here):

df <- c("04 IRB/IEC and other Approvals\04.01 IRB/IEC Trial Approvals\04.01.02 IRB/IEC Approval",
 "01 Trial Management\01.01 Trial Oversight\01.01.02 Trial Management Plan")

All observations have the same structure with two backslashes. I want to extract the 8 characters immediately following the last backslash (or the numerical values including the periods). Here is an example of what I would want in R (I've been trying to use stringr):

df2 <- c("04.01.02", "01.01.02")

If anyone is familiar with the DIA TMF reference model, I want the zone/section/artifact number from the DF.

Thank you!

CodePudding user response：

We may need

library(stringi)
library(stringr)
stri_extract_last_regex(str_replace_all(df, setNames(c(" 04", " 01"),
      c("\004", "\001"))), "\\d{2}\\.\\d{2}\\.\\d{2}")
[1] "04.01.02" "01.01.02"

CodePudding user response：

Instead of splitting on the backslash, if you only want the numbers separated by periods, you could do something like:

stringr::str_extract(df, "\\d\\d\\.\\d\\d\\.\\d\\d")
#> [1] "04.01.02" "01.01.02"

Data used

df <- c("04 IRB/IEC and other Approvals\\04.01 IRB/IEC Trial Approvals\\04.01.02 IRB/IEC Approval",
 "01 Trial Management\01.01 Trial Oversight\\01.01.02 Trial Management Plan")