I want to extract (3) numbers and (3) letters from a string. str <- c("ABC_123", "DEF..456", "GHI--789A")
I want an output like this: ABC123, DEF456, GHI789
How can I do it using stringr package functions?
CodePudding user response:
An alternative to the other answer's excellent one-step would be this two-step approach, which will also ensure that non-compliant strings will be filtered correctly.
str <- c("ABC_123", "DEF..456", "GHI--789A", "GH--789A")
### ^^^^^ added as a non-matching string
library(stringr)
str_replace(
str_extract(vec, "([A-Z]{3}).*([0-9]{3})"),
"^(...).*(...)$", "\\1\\2")
# [1] "ABC123" "DEF456" "GHI789" NA
CodePudding user response:
We may remove the characters that are not alpha numeric and replace with blank (""
) and get the first 6 characters with substr
substr(gsub("[_.- ] ", "", str), 1, 6)
[1] "ABC123" "DEF456" "GHI789"
Or another option is to capture as a group
sub("^([A-Z]{3})[[:punct:] ] (\\d{3}).*", "\\1\\2", str)
[1] "ABC123" "DEF456" "GHI789"
Or in general, use [[:punct:]]
substr(gsub("[[:punct:] ] ", "", str), 1, 6)
[1] "ABC123" "DEF456" "GHI789"
In stringr
, the option is str_remove_all
library(stringr)
substr(str_remove(str, "[[:punct:] ] "), 1, 6)