Home > Software design >  Filtering out a grade variable that is not meeting consecutive order in r
Filtering out a grade variable that is not meeting consecutive order in r

Time:11-20

I have a combined dataset that consists of three years of data for the same ids. When I merged the dataset, I see some of the students' grades are not consecutive in the following years.

Here is sample dataset looks like:

df <- data.frame( id = c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3),
                  category = c("A","A","A","B","B","B","A","A","A","B","B","B","A","A","A","B","B","B"),
                  year = c(18,19,20,18,19,20,18,19,20,18,19,20,18,19,20,18,19,20),
                  grade = c(3,4,5,3,4,5,5,6,8,5,6,8,3,4,6,3,4,6))

> df
   id category year grade
1   1        A   18     3
2   1        A   19     4
3   1        A   20     5
4   1        B   18     3
5   1        B   19     4
6   1        B   20     5
7   2        A   18     5
8   2        A   19     6
9   2        A   20     8
10  2        B   18     5
11  2        B   19     6
12  2        B   20     8
13  3        A   18     3
14  3        A   19     4
15  3        A   20     6
16  3        B   18     3
17  3        B   19     4
18  3        B   20     6

In this sample dataset, id=2 and id=3 have those grades not in order as 5,6,7 and 3,4,5. id=2 has 5,6,8 instead of 5,6,7 and id=3 has 3,4,6 instead of 3,4,5. I would like remove those students from the dataset. My desired output would include only id=1 who has the grades are in order for the consecutive years.

My desired output file would be:

   > df
       id category year grade
    1   1        A   18     3
    2   1        A   19     4
    3   1        A   20     5
    4   1        B   18     3
    5   1        B   19     4
    6   1        B   20     5

Any ideas? Thanks!

CodePudding user response:

Get the diff and check if all of them is equal to 1, grouped by 'id', and 'category' to filter the groups

library(dplyr)
df %>% 
    group_by(id, category) %>% 
    filter(all(diff(grade) == 1)) %>% 
    ungroup

-output

# A tibble: 6 × 4
     id category  year grade
  <dbl> <chr>    <dbl> <dbl>
1     1 A           18     3
2     1 A           19     4
3     1 A           20     5
4     1 B           18     3
5     1 B           19     4
6     1 B           20     5
  • Related