I have a dataframe that looks like this:
id | score |
---|---|
x | 1 |
x | 2 |
x | 3 |
y | 1 |
y | 2 |
y | 3 |
... |
I want to check if every ID has 1, 2 and 3 in the "score" column. If some of the IDs doesn't have either 1, 2 or 3, I want to save them to a vector.
I tried to somehow loop it / write a condition in dplyr but failed:
group_by(id) %>%
{if(!1 %in% score | !2 %in% score | !3 %in% score ) {print(id)}}```
CodePudding user response:
After grouping by 'id', filter
by creating a logical vector, and wrapping all
i.e. if all the 1, 2, 3 are in the 'score', then we negate (!
) to filter only those groups that doesn't comply with it, get the distinct
'id' and pull
as a vector
library(dplyr)
v1 <- df1 %>%
group_by(id) %>%
filter(!all(c(1, 2, 3) %in% score)) %>%
ungroup %>%
distinct(id) %>%
pull(id)
-output
> v1
[1] "z"
NOTE: print
just prints the output in the console and doesn't have a return value. We may need to store in an object
In the OP's code, the condition argument with if
is outside the tidyverse functions and score
is not an object in the global env, instead it is a column in the data. We may use .$
or .[[
to extract, but that will also lose the grouping attribute. It is better to do this within tidyverse functions i.e. filter
or summarise
etc. Or we may use group_modify
to do the print
based on the OP's code
df1 %>%
group_by(id) %>%
group_modify(~ {if(!1 %in% .x$score | !2 %in% .x$score |
!3 %in% .x$score ) {
print(.y$id)
}
.x})
[1] "z"
# A tibble: 8 x 2
# Groups: id [3]
id score
<chr> <int>
1 x 1
2 x 2
3 x 3
4 y 1
5 y 2
6 y 3
7 z 1
8 z 2
data
df1 <- structure(list(id = c("x", "x", "x", "y", "y", "y", "z", "z"),
score = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L)),
class = "data.frame", row.names = c(NA,
-8L))