Home > Enterprise >  select string after second space
select string after second space

Time:10-13

I have a column containning different names, I would like to get all the strings after the second space of characters.

My example.

df <- data.frame(col = c("Adenia macrophylla", "Adinobotrys atropurpureus (Wall.) Dunn", "Ardisia purpurea Reinw. ex Blume"))

My desired outcome like this

                                     col
1                                    
2                           (Wall.) Dunn
3                        Reinw. ex Blume

Any sugesstions for me? The way before I did is to separate them and unite, but I consider whether we have any fancy way or better to do it, since I already have many columns.

Update Just solve it

xx %>% 
  mutate(col = str_pad(col, 20,"right")) %>% 
  mutate(col = str_remove(col, '\\w \\s\\w \\s'))

Thanks @Ronak and @U12-Forward for providing me regex

CodePudding user response:

You may use sub -

sub('\\w \\s\\w \\s', '', df$col)
#[1] "(Wall.) Dunn"    "Reinw. ex Blume"
#Also
#sub('.*?\\s.*?\\s', '', df$col)

If you want a tidyverse answer.

library(dplyr)
library(stringr)

df %>% mutate(val = str_remove(col, '\\w \\s\\w \\s'))

CodePudding user response:

In case you want to select string after n space's it might be good to use repetition in sub.

sub("([^ ]* ){2}(.*)|.*", "\\2", df$col)
#sub("([^ ]* ){2}|.*", "", df$col, perl=TRUE) #Alternative
#[1] ""                "(Wall.) Dunn"    "Reinw. ex Blume"

[^ ] get everything but not a space * 0 to n times, match a space, {2} match it two times, .* match everything.

CodePudding user response:

Or use this regex:

df$col <- sub('^\\S \\s \\S ', '', df$col)

Output df:

> df
               col
1                 
2     (Wall.) Dunn
3  Reinw. ex Blume
> 
  • Related