select multiple characters before one same character (R)-CodePudding

My example as below:

df <- data.frame(x = c("Santiria laevigata Blume f. laevigata", 
                 "Santiria laevigata", 
                 "Santiria laevigata Blume f. glabrifolia (Engl.) H.J.Lam"))

                                                        x
1                   Santiria laevigata Blume f. laevigata
2                                      Santiria laevigata
3 Santiria laevigata Blume f. glabrifolia (Engl.) H.J.Lam

I would like to get only Santiria laevigata by using string to say that I will keep every letters before Blume or in other words, I gonna remove all characters starting from Blume. Any suggestions for me?

Desired output

                                     x                  
1                   Santiria laevigata  
2                   Santiria laevigata
3                   Santiria laevigata

CodePudding user response：

You can use sub to remove everything from Blume.*.

df$y <- trimws(sub('Blume.*', '', df$x))
df$y
#[1] "Santiria laevigata" "Santiria laevigata" "Santiria laevigata"

CodePudding user response：

Simply using gsub

df$x <- gsub("Blume. ", "", df$x)

                    x
1 Santiria laevigata 
2  Santiria laevigata
3 Santiria laevigata

CodePudding user response：

you could try changing the df to

df <-  c("Santiria laevigata Blume f. laevigata", 
             "Santiria laevigata", 
             "Santiria laevigata Blume f. glabrifolia (Engl.)    H.J.Lam"))

and then entering as follows

new_df <- substr(df,1,18)
new_df

[1] "Santiria laevigata" "Santiria laevigata" "Santiria laevigata"

I don't know how to make it work with

data.frame(x = c("abc"))