I have a column containning different names, I would like to get all the strings after the second space of characters.
My example.
df <- data.frame(col = c("Adenia macrophylla", "Adinobotrys atropurpureus (Wall.) Dunn", "Ardisia purpurea Reinw. ex Blume"))
My desired outcome like this
col
1
2 (Wall.) Dunn
3 Reinw. ex Blume
Any sugesstions for me? The way before I did is to separate
them and unite
, but I consider whether we have any fancy way or better to do it, since I already have many columns.
Update Just solve it
xx %>%
mutate(col = str_pad(col, 20,"right")) %>%
mutate(col = str_remove(col, '\\w \\s\\w \\s'))
Thanks @Ronak and @U12-Forward for providing me regex
CodePudding user response:
You may use sub
-
sub('\\w \\s\\w \\s', '', df$col)
#[1] "(Wall.) Dunn" "Reinw. ex Blume"
#Also
#sub('.*?\\s.*?\\s', '', df$col)
If you want a tidyverse
answer.
library(dplyr)
library(stringr)
df %>% mutate(val = str_remove(col, '\\w \\s\\w \\s'))
CodePudding user response:
In case you want to select string after n space's it might be good to use repetition in sub
.
sub("([^ ]* ){2}(.*)|.*", "\\2", df$col)
#sub("([^ ]* ){2}|.*", "", df$col, perl=TRUE) #Alternative
#[1] "" "(Wall.) Dunn" "Reinw. ex Blume"
[^ ]
get everything but not a space *
0 to n times,
match a space, {2}
match it two times, .*
match everything.
CodePudding user response:
Or use this regex:
df$col <- sub('^\\S \\s \\S ', '', df$col)
Output df
:
> df
col
1
2 (Wall.) Dunn
3 Reinw. ex Blume
>