Home > front end >  regex to match specific set of numbers or letters
regex to match specific set of numbers or letters

Time:01-05

Working with survey data classified into various "waves", with each wave labeled either 1 - 14, or the letters "A" or "E", followed by the variable name.

For example, want to parse:

  • 3educ > wave: 3, variable: educ
  • Aage > wave: A, variable: age

Tried various strings, such as

^([0-9]?|A|E)(\\w )

to no effect. Please advise.

(Using stringr with R)

CodePudding user response:

Nevermind I think I got it:

^([0-9][0-9]?|a|e)(\\w )

CodePudding user response:

If you need to create a regex for a numeric range, consider using the automatic numeric range regex generator. The regex to match integer numbers from 1 to 14 is (?:[1-9]|1[0-4]).

So, you need to use

(?i)^(?P<wave>[1-9AE]|1[0-4])(?P<variable>\w )

See the regex demo. (?i) sets the case insensitive mode on and [1-9AE] matches either a non-zero digit or A or E chars.

In R, you can use named capturing groups with namedCapture library:

x <- c("3educ","Aage","14abc","Ekajshklasjf")
library(namedCapture)
str_match_all_named(x, "(?i)^(?<wave>[1-9AE]|1[0-4])(?<variable>\\w )")

Output:

[[1]]
     wave variable
[1,] "3"  "educ"  

[[2]]
     wave variable
[1,] "A"  "age"   

[[3]]
     wave variable
[1,] "1"  "4abc"  

[[4]]
     wave variable     
[1,] "E"  "kajshklasjf
  •  Tags:  
  • Related