I have a data.frame as follow:
df = data.frame(sp_name = c("Xylopia brasiliensis", "Xylosma tweediana", "Zanthoxylum fagara subsp. lentiscifolium", "Schinus terebinthifolia var. raddiana", "Eugenia"), value = c(1, 2, 3, 4, 5))
Here's the deal: I am only interested in subsetting/filtering the rows from the df that contain exactly two words (in my case, Xylopia brasiliensis and Xylosma tweediana). How can I proceed? I'm failing miserably in using the filter
function from tidyverse
Thanks already.
CodePudding user response:
We can use str_count
to create a logical vector in filter
library(dplyr)
library(stringr)
df %>%
filter(str_count(sp_name, "\\w ") == 2)
-output
sp_name value
1 Xylopia brasiliensis 1
2 Xylosma tweediana 2
Or this can be done with str_detect
as well - match the word (\\w
) from the start (^
) followed by a space and another word (\\w
) at the end ($
) of the string
df %>%
filter(str_detect(sp_name, "^\\w \\w $"))
Or in base R
with grep
subset(df, grepl("^\\w \\w $", sp_name))
sp_name value
1 Xylopia brasiliensis 1
2 Xylosma tweediana 2