I want to clean up a taxonomy table with bacterial species in R
and I want to delete values from all cells that start with the small letter.
I have a column from taxonomy df:
Species |
---|
Tuwongella immobilis |
Woesebacteria |
unidentified marine |
bacterium Ellin506 |
And I want:
Species |
---|
Tuwongella immobilis |
Woesebacteria |
unwanted <- "^[:upper:] [:lower:] "
tax.clean$Species <- str_replace_all(tax.clean$Species, unwanted, "")
but it doesn't seem to work and does not match desired species.
CodePudding user response:
If you are working with dataframe, I suggest using dplyr::filter
to clean up the dataframe.
grepl()
returns logical values, !grepl(^[[:lower:]])
looks for anything that does not start with a lower case letter (^
indicate the beginning of a string).
library(dplyr)
df %>% filter(!grepl("^[[:lower:]]", Species))
Species
1 Tuwongella immobilis
2 Woesebacteria
CodePudding user response:
We can do
grep('^[A-Z]', df$Species, value = T)
[1] "Tuwongella immobilis" "Woesebacteria"