Home > Software engineering >  Removing first word from data frame cell when it starts with lowercase letter in R
Removing first word from data frame cell when it starts with lowercase letter in R

Time:03-22

I want to clean up a taxonomy table with bacterial species in R and I want to delete values from all cells that start with the small letter.

I have a column from taxonomy df:

Species
Tuwongella immobilis
Woesebacteria
unidentified marine
bacterium Ellin506

And I want:

Species
Tuwongella immobilis
Woesebacteria
unwanted <- "^[:upper:] [:lower:] "
tax.clean$Species <- str_replace_all(tax.clean$Species, unwanted, "")

but it doesn't seem to work and does not match desired species.

CodePudding user response:

If you are working with dataframe, I suggest using dplyr::filter to clean up the dataframe.

grepl() returns logical values, !grepl(^[[:lower:]]) looks for anything that does not start with a lower case letter (^ indicate the beginning of a string).

library(dplyr)

df %>% filter(!grepl("^[[:lower:]]", Species))

               Species
1 Tuwongella immobilis
2        Woesebacteria

CodePudding user response:

We can do

grep('^[A-Z]', df$Species, value = T)
[1] "Tuwongella immobilis" "Woesebacteria" 
  • Related