I have a combined dataset that consists of three years of data for the same id
s. When I merged the dataset, I see some of the students' grades are not consecutive in the following years.
Here is sample dataset looks like:
df <- data.frame( id = c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3),
category = c("A","A","A","B","B","B","A","A","A","B","B","B","A","A","A","B","B","B"),
year = c(18,19,20,18,19,20,18,19,20,18,19,20,18,19,20,18,19,20),
grade = c(3,4,5,3,4,5,5,6,8,5,6,8,3,4,6,3,4,6))
> df
id category year grade
1 1 A 18 3
2 1 A 19 4
3 1 A 20 5
4 1 B 18 3
5 1 B 19 4
6 1 B 20 5
7 2 A 18 5
8 2 A 19 6
9 2 A 20 8
10 2 B 18 5
11 2 B 19 6
12 2 B 20 8
13 3 A 18 3
14 3 A 19 4
15 3 A 20 6
16 3 B 18 3
17 3 B 19 4
18 3 B 20 6
In this sample dataset, id=2
and id=3
have those grades not in order as 5,6,7
and 3,4,5
. id=2
has 5,6,8
instead of 5,6,7
and id=3
has 3,4,6
instead of 3,4,5
. I would like remove those students from the dataset. My desired output would include only id=1
who has the grades are in order for the consecutive years.
My desired output file would be:
> df
id category year grade
1 1 A 18 3
2 1 A 19 4
3 1 A 20 5
4 1 B 18 3
5 1 B 19 4
6 1 B 20 5
Any ideas? Thanks!
CodePudding user response:
Get the diff
and check if all
of them is equal to 1, grouped by 'id', and 'category' to filter
the groups
library(dplyr)
df %>%
group_by(id, category) %>%
filter(all(diff(grade) == 1)) %>%
ungroup
-output
# A tibble: 6 × 4
id category year grade
<dbl> <chr> <dbl> <dbl>
1 1 A 18 3
2 1 A 19 4
3 1 A 20 5
4 1 B 18 3
5 1 B 19 4
6 1 B 20 5