Home > Software engineering >  filter() in dplyr problem. Can't filter to make a dataset containing three characters
filter() in dplyr problem. Can't filter to make a dataset containing three characters

Time:02-25

pic of environment

I am attempting to make a new dataset from my original monarch.data df that consists of all variables for just three milkweed species (Milkweed_species).

library(dplyr)
three_species <- filter(milkweed.data, milkweed.data$Milkweed_species == c("Asclepias nivea","Asclepias tuberosa", "Asclepias californica"))

This is the code I have tried, along with other attempts and it is returning a df that only has one of the species: enter image description here

What am I doing wrong/how do I fix this?

CodePudding user response:

In dplyr, you have to pass the variables in tidy format. And also, the comparison is not well formatted. Try this:

library(dplyr)
three_species <- filter(milkweed.data, Milkweed_species %in% c("Asclepias nivea","Asclepias tuberosa", "Asclepias californica"))

milkweed.data should be you original data. If, as you said, your original data is monarch.data, use this as the first argument of filter.

CodePudding user response:

For filtering by multiple values, you need to use %in% rather than ==.

library(dplyr)

three_species <-
  filter(milkweed.data,
    Milkweed_species %in% c(
      "Asclepias nivea",
      "Asclepias tuberosa",
      "Asclepias californica"
    )
  )

Output

       Milkweed_species Trichomes Latex Toxin_amt Caterpillar_mass
1       Asclepias nivea        54    34         6                2
2    Asclepias tuberosa         1     3         2               56
3 Asclepias californica         5     1         6                2
4       Asclepias nivea         5     2         4                3

Essentially, for each value in milkweed.data$Milkweed_species, we are checking to see if it exists in your supplied list. Then, filter will keep only those that are TRUE.

milkweed.data$Milkweed_species %in% c("Asclepias nivea", "Asclepias tuberosa", "Asclepias californica")

#[1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE

Data

milkweed.data <- structure(list(Milkweed_species = c("Ascelpias amplexicaulis", 
"Asclepias boliviensis", "Asclepias nivea", "Asclepias tuberosa", 
"Asclepias californica", "Asclepias nivea"), Trichomes = c(3, 
21, 54, 1, 5, 5), Latex = c(1, 5, 34, 3, 1, 2), Toxin_amt = c(3, 
1, 6, 2, 6, 4), Caterpillar_mass = c(3, 5, 2, 56, 2, 3)), class = "data.frame", row.names = c(NA, 
-6L))
  • Related