R Count Unique By Group in DPLYR-CodePudding

HAVE = data.frame("TRIMESTER" = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,4,4,4,4,4,4),
                  "STUDENT" = c(1,2,3,3,4,2,5,6,7,1,2,2,2,2,2,1,2,3,4,5))
HAVE$WANT1 = c(4,4,4,4,4,5,5,5,5,5,1,1,1,1,5,5,5,5,5,5)
HAVE$WANT2 = c(0,0,0,0,0,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1)

I have HAVE and wish to APPEND a column to count the UNIQUE value of STUDENT for every TRIMESTER shown WANT1 and I wish to create WANT2 which is the SUM of times for every TRIMESTER that STUDENT==5 appears so STUDENT==5 appear ZERO times in TRIMESTER == 1, so the value for all TRIMESTER == 1 is ZERO but student 5 appear ONCE in TRIMESTER==4 so the value is 1

CodePudding user response：

After grouping by 'TRIMESTER', get the count of distinct elements of 'STUDENT' with n_distinct and the count of STUDENT 5 with sum on a logical expression

library(dplyr)
HAVE %>% 
   group_by(TRIMESTER) %>%
   mutate(WANT1new = n_distinct(STUDENT),
          WANT2NEW = sum(STUDENT == 5)) %>% 
   ungroup

-output

# A tibble: 20 × 6
   TRIMESTER STUDENT WANT1 WANT2 WANT1new WANT2NEW
       <dbl>   <dbl> <dbl> <dbl>    <int>    <int>
 1         1       1     4     0        4        0
 2         1       2     4     0        4        0
 3         1       3     4     0        4        0
 4         1       3     4     0        4        0
 5         1       4     4     0        4        0
 6         2       2     5     1        5        1
 7         2       5     5     1        5        1
 8         2       6     5     1        5        1
 9         2       7     5     1        5        1
10         2       1     5     1        5        1
11         3       2     1     0        1        0
12         3       2     1     0        1        0
13         3       2     1     0        1        0
14         3       2     1     0        1        0
15         4       2     5     1        5        1
16         4       1     5     1        5        1
17         4       2     5     1        5        1
18         4       3     5     1        5        1
19         4       4     5     1        5        1
20         4       5     5     1        5        1

CodePudding user response：

The code below should produce the desired result.

library(dplyr)

HAVE %>%
  group_by(TRIMESTER) %>%
  mutate(WANT1 = length(unique(STUDENT)),
      WANT2 = as.numeric(any(5 == STUDENT)))