Home > Mobile >  Extract letters and numbers from a string
Extract letters and numbers from a string

Time:10-14

I want to extract (3) numbers and (3) letters from a string. str <- c("ABC_123", "DEF..456", "GHI--789A")

I want an output like this: ABC123, DEF456, GHI789

How can I do it using stringr package functions?

CodePudding user response:

An alternative to the other answer's excellent one-step would be this two-step approach, which will also ensure that non-compliant strings will be filtered correctly.

str <- c("ABC_123", "DEF..456", "GHI--789A", "GH--789A")
###                                          ^^^^^ added as a non-matching string
library(stringr)
str_replace(
  str_extract(vec, "([A-Z]{3}).*([0-9]{3})"),
  "^(...).*(...)$", "\\1\\2")
# [1] "ABC123" "DEF456" "GHI789" NA      

CodePudding user response:

We may remove the characters that are not alpha numeric and replace with blank ("") and get the first 6 characters with substr

substr(gsub("[_.- ] ", "", str), 1, 6)
[1] "ABC123"  "DEF456"  "GHI789"

Or another option is to capture as a group

sub("^([A-Z]{3})[[:punct:] ] (\\d{3}).*", "\\1\\2", str)
[1] "ABC123" "DEF456" "GHI789"

Or in general, use [[:punct:]]

 substr(gsub("[[:punct:] ] ", "", str), 1, 6)
[1] "ABC123"  "DEF456"  "GHI789"

In stringr, the option is str_remove_all

library(stringr)
substr(str_remove(str, "[[:punct:] ] "), 1, 6)
  • Related