I need some help with stringr::str_extract_all
x
is the name of my data frame.
V1
(A_K9B,A_K9one,A_K9two,B_U10J)
x = x %>%
mutate(N_alph = map_chr(str_extract_all(x$V1, 'A_([A-Z][0-10])[A-Z]'), toString))
x = x %>%
mutate(N_.1 = map_chr(str_extract_all(x$V1, 'A_([A-Z][0-10])[o][n][e]'), toString))
x = x %>%
mutate(N_.2 = map_chr(str_extract_all(x$V1, 'A_([A-Z][0-10])[t][w][o]'), toString))
This is my current output:
V1 N_alph N_.1 N_.2
(A_K9B,A_K9one,A_K9two,B_U10J) A_K9B A_K9one A_K9two
I am fine with my column N_alph
as is I want it separate from the other two. But Ideally I would like to avoid typing [o][n][e]
and [t][w][o]
for those variables that are followed by words rather than one alphabetical letter, if I use:
x = x %>%
mutate(N_alph = map_chr(str_extract_all(x$V1, 'A_([A-Z][0-10])[A-Z]'), toString))
x = x %>%
mutate(N_all.words = map_chr(str_extract_all(x$V1, 'A_([A-Z][0-10])[\\w ]'), toString))
Output is:
V1 N_alph N_all.words
(A_K9B,A_K9one,A_K9two,B_U10J) A_K9B A_K9B,A_K9o,A_K9t
Desired output would be
V1 N_alph N_all.words
(A_K9B,A_K9one,A_K9two,B_U10J) A_K9B A_K9one,A_K9two
CodePudding user response:
When you use metacharacters like \w, \b, \s, etc., you don't need the square brackets. But if you do use the square brackets than the
would need to be outside. Also, the number group should be [0-9] as we are talking about individual characters, not combinations of characters. To take into account numbers higher than 9 we just expand the amount of times we check for the group with {} brackets, or simply the
operator. The final result looks like so:
x %>%
mutate(N_all.words = str_extract_all(V1, 'A_([A-Z][0-9]{1,2})\\w '))
Resulting to:
V1 N_all.words
1 (A_K9B,A_K9one,A_K9two,B_U10J) A_K9B, A_K9one, A_K9two
I also created a version that I found a little tidier:
x %>%
mutate(N_all.words = str_extract_all(V1, 'A_\\w\\d{1,2}\\w '))