I have a dataframe, DATA
, containing information on students and their test dates. I want to create a variable called WANT
, where for all the STUDENT
s you count the unique months (not the unique rows) as shown in the sample of the WANT variable below:
library(dplyr)
set.seed(0)
DATA <- data.frame("STUDENT" = sample(1:5, 100, r = T),
"TESTDATE" = sample(seq(as.Date('2010/01/01'), as.Date('2010/12/31'), by="day"), 100, r=T))
DATA <- DATA %>% arrange(STUDENT, TESTDATE)
DATA$WANT <- c(1,1,1,2,2,3,3,4,4,5,5,6,7,7,8,8,9,1,1,1,2,3,3,4,5,5,6,7,8,8,9,10,10, rep(NA, 67))
My attempt only does rows and it's not what I wish for
DATA %>% group_by(STUDENT) %>% mutate(WANT = 1:n())
CodePudding user response:
We may extract the month part and use match
or as.integer(factor(WANT2 levels = unique(WANT2)))
library(dplyr)
out <- DATA %>%
group_by(STUDENT) %>%
mutate(WANT2 = as.integer(format(TESTDATE, '%m')),
WANT2 = match(WANT2, unique(WANT2))) %>%
ungroup
-output
> head(out, 22) %>% as.data.frame
STUDENT TESTDATE WANT WANT2
1 1 2010-01-20 1 1
2 1 2010-01-31 1 1
3 1 2010-02-10 2 2
4 1 2010-02-10 2 2
5 1 2010-03-27 3 3
6 1 2010-04-20 4 4
7 1 2010-04-21 4 4
8 1 2010-05-02 5 5
9 1 2010-05-06 5 5
10 1 2010-05-13 5 5
11 1 2010-05-20 5 5
12 1 2010-06-17 6 6
13 1 2010-08-22 7 7
14 1 2010-08-25 7 7
15 1 2010-08-27 7 7
16 1 2010-08-30 7 7
17 1 2010-09-06 8 8
18 1 2010-09-30 8 8
19 1 2010-10-27 9 9
20 1 2010-10-31 9 9
21 1 2010-12-10 10 10
22 1 2010-12-21 10 10
If we want the year-month to count as separate, then do
out <- DATA %>%
group_by(STUDENT) %>%
mutate(WANT2 = format(TESTDATE, '%Y-%m'),
WANT2 = match(WANT2, unique(WANT2))) %>%
ungroup
CodePudding user response:
You can use data.table::rleid
:
library(data.table)
setDT(DATA)[, WANT:=rleid(month(TESTDATE)), STUDENT]
Output:
STUDENT TESTDATE WANT
<int> <Date> <num>
1: 1 2010-01-20 1
2: 1 2010-01-31 1
3: 1 2010-02-10 2
4: 1 2010-02-10 2
5: 1 2010-03-27 3
6: 1 2010-04-20 4
7: 1 2010-04-21 4
8: 1 2010-05-02 5
9: 1 2010-05-06 5
10: 1 2010-05-13 5
11: 1 2010-05-20 5
12: 1 2010-06-17 6
13: 1 2010-08-22 7
14: 1 2010-08-25 7
15: 1 2010-08-27 7
16: 1 2010-08-30 7
17: 1 2010-09-06 8
18: 1 2010-09-30 8
19: 1 2010-10-27 9
20: 1 2010-10-31 9
21: 1 2010-12-10 10
22: 1 2010-12-21 10
23: 2 2010-01-19 1
24: 2 2010-02-05 2
25: 2 2010-02-09 2
26: 2 2010-02-18 2
27: 2 2010-02-19 2
28: 2 2010-03-16 3
29: 2 2010-03-18 3
30: 2 2010-04-02 4
31: 2 2010-04-12 4
32: 2 2010-04-26 4
33: 2 2010-05-04 5
34: 2 2010-05-10 5
35: 2 2010-07-31 6
36: 2 2010-09-04 7
37: 2 2010-09-22 7
38: 2 2010-10-13 8
39: 2 2010-10-24 8
40: 2 2010-11-02 9
41: 2 2010-11-24 9
42: 2 2010-12-21 10
43: 2 2010-12-22 10
44: 2 2010-12-28 10
45: 3 2010-01-16 1
46: 3 2010-03-08 2
47: 3 2010-04-15 3
48: 3 2010-04-23 3
49: 3 2010-04-27 3
50: 3 2010-05-09 4
51: 3 2010-05-18 4
52: 3 2010-06-01 5
53: 3 2010-10-27 6
54: 3 2010-11-01 7
55: 3 2010-11-05 7
56: 3 2010-11-12 7
57: 3 2010-11-24 7
58: 3 2010-12-22 8
59: 3 2010-12-27 8
60: 3 2010-12-30 8
61: 4 2010-01-14 1
62: 4 2010-01-27 1
63: 4 2010-03-05 2
64: 4 2010-03-06 2
65: 4 2010-03-06 2
66: 4 2010-03-14 2
67: 4 2010-03-19 2
68: 4 2010-04-08 3
69: 4 2010-06-09 4
70: 4 2010-06-23 4
71: 4 2010-07-13 5
72: 4 2010-07-26 5
73: 4 2010-08-06 6
74: 4 2010-09-28 7
75: 4 2010-09-30 7
76: 4 2010-11-23 8
77: 4 2010-12-15 9
78: 4 2010-12-24 9
79: 4 2010-12-25 9
80: 5 2010-01-15 1
81: 5 2010-01-16 1
82: 5 2010-02-20 2
83: 5 2010-02-22 2
84: 5 2010-03-13 3
85: 5 2010-04-26 4
86: 5 2010-05-01 5
87: 5 2010-05-07 5
88: 5 2010-05-10 5
89: 5 2010-05-15 5
90: 5 2010-07-31 6
91: 5 2010-08-07 7
92: 5 2010-08-17 7
93: 5 2010-09-03 8
94: 5 2010-09-11 8
95: 5 2010-10-21 9
96: 5 2010-11-01 10
97: 5 2010-11-02 10
98: 5 2010-11-03 10
99: 5 2010-11-05 10
100: 5 2010-12-02 11
STUDENT TESTDATE WANT