I have the following data string
Seat_WASHER<-
structure(
list(
Description = c(
"SEAT WASHER, MR2, 8\", TN 10.12, CR 150/600, 316 Stainless Steel",
"SEAT WASHER, 1\", TN 1.42, CR 950/1200, MR1, 316 Stainless Steel",
"SEAT WASHER, 3\", TN 1.52, MR1, 316 Stainless Steel",
"SEAT WASHER, MR1, 2\", TN 1.62, CR 800/1200, 316 Stainless Steel",
"SEAT WASHER, MR1, TN 2.12, 1/2\", CR 150/600, 316 Stainless Steel",
"SEAT WASHER, MR6, 2\", TN 6.48, CR 750/100, 316 Stainless Steel"
)
),
row.names = c(NA,-7L),
class = c("tbl_df", "tbl", "data.frame")
)
It's a very large data set and is not consistent in it's order or contents with strings.
How do I find key indicators (", CR, MR), and pull all data between the delimiters into a column? If it can't find the key indicator in the string it'll need to output NULL.
Finding all CR will result in a column like:
Col 1
--------
CR 150/600
CR 950/1200
NULL
CR 800/1200
CR 150/600
CR 750/100
CodePudding user response:
You can try
library(stringr)
Seat_WASHER$col1 <- str_extract(Seat_WASHER$Description , "CR \\d /\\d ")
- output
col1
1 CR 150/600
2 CR 950/1200
3 <NA>
4 CR 800/1200
5 CR 150/600
6 CR 750/100
CodePudding user response:
If it is always split by a comma you can use strsplit to separate the string then find where CR
is located using grep()
, specify value = TRUE to return the value. I added trimws to remove the leading space.
m1 <- "SEAT WASHER, MR6, 2\", TN 6.48, CR 750/100, 316 Stainless Steel"
m2 <- strsplit(m1,",")
trimws(grep("CR",m2[[1]], value = TRUE))
edit based on data
Still will string split and then keep where CR is inm3
before appending to data turn all length 0 vectors to NA
m2 <- strsplit(Seat_WASHER$Description,",")
m3 <- sapply(m2, function(x) trimws(grep("CR",x, value = TRUE)))
Seat_WASHER$newcol <- sapply(m3, function(x) if(identical(x, character(0))) NA_character_ else x)