I have a dataframe like this:
vec <- LETTERS[1 : 4]
id <- rep(vec, each = 16)
date <- seq(as.Date("2021-01-01"), as.Date("2021-03-05"), by = "days")
data <- cbind.data.frame(id, date)
I want to create a new column, "section", which has a unique identifying value for every 14 days. I know I need to use dplyr
to group_by
id, because each ID has its own unique starting date and I want the first two weeks (and so on) of A to be labeled as the same "section" as the first two weeks (and so on) of B, C, and D.
Edit: Some of the dates in my real dataset skip days, so I cannot rely on counting by row.
Thanks in advance.
CodePudding user response:
Update II: thanks to @Henrik: Improved coding!
data %>%
group_by(id) %>%
mutate(g = (as.numeric(date - first(date))) %/% 14)
Update I after clarification:
data %>%
as_tibble() %>%
mutate(diff = date - first(date)) %>%
group_by(Y= cumsum(as.numeric(diff) %% 14 == 0)) %>%
print(n=80)
id date diff Y
<chr> <date> <drtn> <int>
1 A 2021-01-01 0 days 1
2 A 2021-01-02 1 days 1
3 A 2021-01-03 2 days 1
4 A 2021-01-04 3 days 1
5 A 2021-01-05 4 days 1
6 A 2021-01-06 5 days 1
7 A 2021-01-07 6 days 1
8 A 2021-01-08 7 days 1
9 A 2021-01-09 8 days 1
10 A 2021-01-10 9 days 1
11 A 2021-01-11 10 days 1
12 A 2021-01-12 11 days 1
13 A 2021-01-13 12 days 1
14 A 2021-01-14 13 days 1
15 A 2021-01-15 14 days 2
16 A 2021-01-16 15 days 2
17 B 2021-01-17 16 days 2
18 B 2021-01-18 17 days 2
19 B 2021-01-19 18 days 2
20 B 2021-01-20 19 days 2
21 B 2021-01-21 20 days 2
22 B 2021-01-22 21 days 2
23 B 2021-01-23 22 days 2
24 B 2021-01-24 23 days 2
25 B 2021-01-25 24 days 2
26 B 2021-01-26 25 days 2
27 B 2021-01-27 26 days 2
28 B 2021-01-28 27 days 2
29 B 2021-01-29 28 days 3
30 B 2021-01-30 29 days 3
31 B 2021-01-31 30 days 3
32 B 2021-02-01 31 days 3
33 C 2021-02-02 32 days 3
34 C 2021-02-03 33 days 3
35 C 2021-02-04 34 days 3
36 C 2021-02-05 35 days 3
37 C 2021-02-06 36 days 3
38 C 2021-02-07 37 days 3
39 C 2021-02-08 38 days 3
40 C 2021-02-09 39 days 3
41 C 2021-02-10 40 days 3
42 C 2021-02-11 41 days 3
43 C 2021-02-12 42 days 4
44 C 2021-02-13 43 days 4
45 C 2021-02-14 44 days 4
46 C 2021-02-15 45 days 4
47 C 2021-02-16 46 days 4
48 C 2021-02-17 47 days 4
49 D 2021-02-18 48 days 4
50 D 2021-02-19 49 days 4
51 D 2021-02-20 50 days 4
52 D 2021-02-21 51 days 4
53 D 2021-02-22 52 days 4
54 D 2021-02-23 53 days 4
55 D 2021-02-24 54 days 4
56 D 2021-02-25 55 days 4
57 D 2021-02-26 56 days 5
58 D 2021-02-27 57 days 5
59 D 2021-02-28 58 days 5
60 D 2021-03-01 59 days 5
61 D 2021-03-02 60 days 5
62 D 2021-03-03 61 days 5
63 D 2021-03-04 62 days 5
64 D 2021-03-05 63 days 5
We could do it this way using gl
:
library(dpylr)
data %>%
mutate(group =as.integer(gl(n(),14,n())))
id date group
1 A 2021-01-01 1
2 A 2021-01-02 1
3 A 2021-01-03 1
4 A 2021-01-04 1
5 A 2021-01-05 1
6 A 2021-01-06 1
7 A 2021-01-07 1
8 A 2021-01-08 1
9 A 2021-01-09 1
10 A 2021-01-10 1
11 A 2021-01-11 1
12 A 2021-01-12 1
13 A 2021-01-13 1
14 A 2021-01-14 1
15 A 2021-01-15 2
16 A 2021-01-16 2
17 B 2021-01-17 2
18 B 2021-01-18 2
19 B 2021-01-19 2
20 B 2021-01-20 2
21 B 2021-01-21 2
22 B 2021-01-22 2
23 B 2021-01-23 2
24 B 2021-01-24 2
25 B 2021-01-25 2
26 B 2021-01-26 2
27 B 2021-01-27 2
28 B 2021-01-28 2
29 B 2021-01-29 3
30 B 2021-01-30 3
31 B 2021-01-31 3
32 B 2021-02-01 3
33 C 2021-02-02 3
34 C 2021-02-03 3
35 C 2021-02-04 3
36 C 2021-02-05 3
37 C 2021-02-06 3
38 C 2021-02-07 3
39 C 2021-02-08 3
40 C 2021-02-09 3
41 C 2021-02-10 3
42 C 2021-02-11 3
43 C 2021-02-12 4
44 C 2021-02-13 4
45 C 2021-02-14 4
46 C 2021-02-15 4
47 C 2021-02-16 4
48 C 2021-02-17 4
49 D 2021-02-18 4
50 D 2021-02-19 4
51 D 2021-02-20 4
52 D 2021-02-21 4
53 D 2021-02-22 4
54 D 2021-02-23 4
55 D 2021-02-24 4
56 D 2021-02-25 4
57 D 2021-02-26 5
58 D 2021-02-27 5
59 D 2021-02-28 5
60 D 2021-03-01 5
61 D 2021-03-02 5
62 D 2021-03-03 5
63 D 2021-03-04 5
64 D 2021-03-05 5