Working with survey data classified into various "waves", with each wave labeled either 1 - 14, or the letters "A" or "E", followed by the variable name.
For example, want to parse:
- 3educ > wave: 3, variable: educ
- Aage > wave: A, variable: age
Tried various strings, such as
^([0-9]?|A|E)(\\w )
to no effect. Please advise.
(Using stringr with R)
CodePudding user response:
Nevermind I think I got it:
^([0-9][0-9]?|a|e)(\\w )
CodePudding user response:
If you need to create a regex for a numeric range, consider using the automatic numeric range regex generator. The regex to match integer numbers from 1
to 14
is (?:[1-9]|1[0-4])
.
So, you need to use
(?i)^(?P<wave>[1-9AE]|1[0-4])(?P<variable>\w )
See the regex demo. (?i)
sets the case insensitive mode on and [1-9AE]
matches either a non-zero digit or A
or E
chars.
In R, you can use named capturing groups with namedCapture
library:
x <- c("3educ","Aage","14abc","Ekajshklasjf")
library(namedCapture)
str_match_all_named(x, "(?i)^(?<wave>[1-9AE]|1[0-4])(?<variable>\\w )")
Output:
[[1]]
wave variable
[1,] "3" "educ"
[[2]]
wave variable
[1,] "A" "age"
[[3]]
wave variable
[1,] "1" "4abc"
[[4]]
wave variable
[1,] "E" "kajshklasjf