Home > front end >  Count rows by group based on values
Count rows by group based on values

Time:08-15

I have a dataframe, DATA, containing information on students and their test dates. I want to create a variable called WANT, where for all the STUDENTs you count the unique months (not the unique rows) as shown in the sample of the WANT variable below:

library(dplyr)

set.seed(0)

DATA <- data.frame("STUDENT" = sample(1:5, 100, r = T),
                   "TESTDATE" = sample(seq(as.Date('2010/01/01'), as.Date('2010/12/31'), by="day"), 100, r=T))
    
DATA <- DATA %>% arrange(STUDENT, TESTDATE)
    
DATA$WANT <- c(1,1,1,2,2,3,3,4,4,5,5,6,7,7,8,8,9,1,1,1,2,3,3,4,5,5,6,7,8,8,9,10,10, rep(NA, 67))

My attempt only does rows and it's not what I wish for

DATA %>% group_by(STUDENT) %>% mutate(WANT = 1:n())

CodePudding user response:

We may extract the month part and use match or as.integer(factor(WANT2 levels = unique(WANT2)))

library(dplyr)
out <- DATA %>% 
  group_by(STUDENT) %>% 
  mutate(WANT2 = as.integer(format(TESTDATE, '%m')),
        WANT2 = match(WANT2, unique(WANT2))) %>%
   ungroup

-output

> head(out, 22) %>% as.data.frame
   STUDENT   TESTDATE WANT WANT2
1        1 2010-01-20    1     1
2        1 2010-01-31    1     1
3        1 2010-02-10    2     2
4        1 2010-02-10    2     2
5        1 2010-03-27    3     3
6        1 2010-04-20    4     4
7        1 2010-04-21    4     4
8        1 2010-05-02    5     5
9        1 2010-05-06    5     5
10       1 2010-05-13    5     5
11       1 2010-05-20    5     5
12       1 2010-06-17    6     6
13       1 2010-08-22    7     7
14       1 2010-08-25    7     7
15       1 2010-08-27    7     7
16       1 2010-08-30    7     7
17       1 2010-09-06    8     8
18       1 2010-09-30    8     8
19       1 2010-10-27    9     9
20       1 2010-10-31    9     9
21       1 2010-12-10   10    10
22       1 2010-12-21   10    10

If we want the year-month to count as separate, then do

out <- DATA %>% 
  group_by(STUDENT) %>% 
  mutate(WANT2 = format(TESTDATE, '%Y-%m'),
        WANT2 = match(WANT2, unique(WANT2))) %>%
   ungroup

CodePudding user response:

You can use data.table::rleid:

library(data.table)
setDT(DATA)[, WANT:=rleid(month(TESTDATE)), STUDENT]

Output:

     STUDENT   TESTDATE  WANT
       <int>     <Date> <num>
  1:       1 2010-01-20     1
  2:       1 2010-01-31     1
  3:       1 2010-02-10     2
  4:       1 2010-02-10     2
  5:       1 2010-03-27     3
  6:       1 2010-04-20     4
  7:       1 2010-04-21     4
  8:       1 2010-05-02     5
  9:       1 2010-05-06     5
 10:       1 2010-05-13     5
 11:       1 2010-05-20     5
 12:       1 2010-06-17     6
 13:       1 2010-08-22     7
 14:       1 2010-08-25     7
 15:       1 2010-08-27     7
 16:       1 2010-08-30     7
 17:       1 2010-09-06     8
 18:       1 2010-09-30     8
 19:       1 2010-10-27     9
 20:       1 2010-10-31     9
 21:       1 2010-12-10    10
 22:       1 2010-12-21    10
 23:       2 2010-01-19     1
 24:       2 2010-02-05     2
 25:       2 2010-02-09     2
 26:       2 2010-02-18     2
 27:       2 2010-02-19     2
 28:       2 2010-03-16     3
 29:       2 2010-03-18     3
 30:       2 2010-04-02     4
 31:       2 2010-04-12     4
 32:       2 2010-04-26     4
 33:       2 2010-05-04     5
 34:       2 2010-05-10     5
 35:       2 2010-07-31     6
 36:       2 2010-09-04     7
 37:       2 2010-09-22     7
 38:       2 2010-10-13     8
 39:       2 2010-10-24     8
 40:       2 2010-11-02     9
 41:       2 2010-11-24     9
 42:       2 2010-12-21    10
 43:       2 2010-12-22    10
 44:       2 2010-12-28    10
 45:       3 2010-01-16     1
 46:       3 2010-03-08     2
 47:       3 2010-04-15     3
 48:       3 2010-04-23     3
 49:       3 2010-04-27     3
 50:       3 2010-05-09     4
 51:       3 2010-05-18     4
 52:       3 2010-06-01     5
 53:       3 2010-10-27     6
 54:       3 2010-11-01     7
 55:       3 2010-11-05     7
 56:       3 2010-11-12     7
 57:       3 2010-11-24     7
 58:       3 2010-12-22     8
 59:       3 2010-12-27     8
 60:       3 2010-12-30     8
 61:       4 2010-01-14     1
 62:       4 2010-01-27     1
 63:       4 2010-03-05     2
 64:       4 2010-03-06     2
 65:       4 2010-03-06     2
 66:       4 2010-03-14     2
 67:       4 2010-03-19     2
 68:       4 2010-04-08     3
 69:       4 2010-06-09     4
 70:       4 2010-06-23     4
 71:       4 2010-07-13     5
 72:       4 2010-07-26     5
 73:       4 2010-08-06     6
 74:       4 2010-09-28     7
 75:       4 2010-09-30     7
 76:       4 2010-11-23     8
 77:       4 2010-12-15     9
 78:       4 2010-12-24     9
 79:       4 2010-12-25     9
 80:       5 2010-01-15     1
 81:       5 2010-01-16     1
 82:       5 2010-02-20     2
 83:       5 2010-02-22     2
 84:       5 2010-03-13     3
 85:       5 2010-04-26     4
 86:       5 2010-05-01     5
 87:       5 2010-05-07     5
 88:       5 2010-05-10     5
 89:       5 2010-05-15     5
 90:       5 2010-07-31     6
 91:       5 2010-08-07     7
 92:       5 2010-08-17     7
 93:       5 2010-09-03     8
 94:       5 2010-09-11     8
 95:       5 2010-10-21     9
 96:       5 2010-11-01    10
 97:       5 2010-11-02    10
 98:       5 2010-11-03    10
 99:       5 2010-11-05    10
100:       5 2010-12-02    11
     STUDENT   TESTDATE  WANT
  • Related