Home > front end >  regex to match specific set of numbers or letters
regex to match specific set of numbers or letters


Working with survey data classified into various "waves", with each wave labeled either 1 - 14, or the letters "A" or "E", followed by the variable name.

For example, want to parse:

  • 3educ > wave: 3, variable: educ
  • Aage > wave: A, variable: age

Tried various strings, such as

^([0-9]?|A|E)(\\w )

to no effect. Please advise.

(Using stringr with R)

CodePudding user response:

Nevermind I think I got it:

^([0-9][0-9]?|a|e)(\\w )

CodePudding user response:

If you need to create a regex for a numeric range, consider using the automatic numeric range regex generator. The regex to match integer numbers from 1 to 14 is (?:[1-9]|1[0-4]).

So, you need to use

(?i)^(?P<wave>[1-9AE]|1[0-4])(?P<variable>\w )

See the regex demo. (?i) sets the case insensitive mode on and [1-9AE] matches either a non-zero digit or A or E chars.

In R, you can use named capturing groups with namedCapture library:

x <- c("3educ","Aage","14abc","Ekajshklasjf")
str_match_all_named(x, "(?i)^(?<wave>[1-9AE]|1[0-4])(?<variable>\\w )")


     wave variable
[1,] "3"  "educ"  

     wave variable
[1,] "A"  "age"   

     wave variable
[1,] "1"  "4abc"  

     wave variable     
[1,] "E"  "kajshklasjf
  •  Tags:  
  • Related