Find and extract text between delimiters R-CodePudding

I have the following data string

    Seat_WASHER<-
  structure(
    list(
      Description = c(
        "SEAT WASHER, MR2, 8\", TN 10.12, CR 150/600, 316 Stainless Steel",
        "SEAT WASHER, 1\", TN 1.42, CR 950/1200, MR1, 316 Stainless Steel",
        "SEAT WASHER, 3\", TN 1.52,  MR1, 316 Stainless Steel",
        "SEAT WASHER, MR1, 2\", TN 1.62, CR 800/1200, 316 Stainless Steel",
        "SEAT WASHER, MR1, TN 2.12, 1/2\", CR 150/600, 316 Stainless Steel",
        "SEAT WASHER, MR6, 2\", TN 6.48, CR 750/100, 316 Stainless Steel"
      )
    ),
    row.names = c(NA,-7L),
    class = c("tbl_df", "tbl", "data.frame")
  )

It's a very large data set and is not consistent in it's order or contents with strings.

How do I find key indicators (", CR, MR), and pull all data between the delimiters into a column? If it can't find the key indicator in the string it'll need to output NULL.

Finding all CR will result in a column like:

Col 1 
--------
CR 150/600
CR 950/1200
NULL
CR 800/1200
CR 150/600
CR 750/100

CodePudding user response：

You can try

library(stringr)

Seat_WASHER$col1 <- str_extract(Seat_WASHER$Description , "CR \\d /\\d ")

output

         col1
1  CR 150/600
2 CR 950/1200
3        <NA>
4 CR 800/1200
5  CR 150/600
6  CR 750/100

CodePudding user response：

If it is always split by a comma you can use strsplit to separate the string then find where CR is located using grep(), specify value = TRUE to return the value. I added trimws to remove the leading space.

m1 <- "SEAT WASHER, MR6, 2\", TN 6.48, CR 750/100, 316 Stainless Steel"
m2 <- strsplit(m1,",") 
trimws(grep("CR",m2[[1]], value = TRUE))

edit based on data

Still will string split and then keep where CR is inm3 before appending to data turn all length 0 vectors to NA

m2 <-   strsplit(Seat_WASHER$Description,",") 
m3 <- sapply(m2, function(x) trimws(grep("CR",x, value = TRUE)))

Seat_WASHER$newcol <- sapply(m3, function(x) if(identical(x, character(0))) NA_character_ else x)