The title may sound kinda weird but I have found no way of better defining my issue.
Here an example data set:
test = data.frame(genus = c("Acicarpha", "Acicarpha", "Acicarpha", "Acicarpha", "Acisanthera", "Acisanthera", "Acisanthera", "Acisanthera", "Acmella", "Acmella"), sp1 = c("NA", "bonariensis", "bonariensis", "spathulata", NA, "variabilis", "variabilis", "variabilis", NA, NA))
As you can see, I have a few species names (genus sp1) possible: Acicarpha NA, Acicarpha bonariensis, Acicarpha spathulata, Acisanthera variabilis, Acisanthera NA, and Acmella NA.
Here's the deal: I'm trying to select only the row related to Acmella NA since the only returning value on the sp1 column is NA. Other species also have NA, but they do not have only NA. How can I do this? I'm bashing my head.
CodePudding user response:
Here's some code that does what I think you're asking for. It has four steps:
- Group the rows by
genus
. - Make a new column called
all_sp1_na
that isTRUE
if all of each genus'ssp1
observations areNA
,FALSE
otherwise (i.e.FALSE
if at least onesp1
observation is notNA
for that genus). - Filter for rows where
all_sp1_na
is true. - Remove the temporary column
all_sp1_na
.
library(tidyverse)
test %>%
group_by(genus) %>%
mutate(all_sp1_na = all(is.na(sp1))) %>%
filter(all_sp1_na) %>%
select(-all_sp1_na)
And it gives this result:
# A tibble: 2 x 2
# Groups: genus [1]
genus sp1
<chr> <chr>
1 Acmella NA
2 Acmella NA
Let me know if you're looking for something else.