I have the following data frame:
id
A
A
A
A
A
B
B
B
B
B
B
C
C
I want to create another column called "Group.3". In this column, the number of rows per group are counted in threes. 3 consecutive rows are assigned the same group number, and the next 3 or less rows (depending if we're reaching the end of the group) are assigned another higher number.
This is what I would like the column to look like:
id Group.3
A 1
A 1
A 1
A 2
A 2
B 1
B 1
B 1
B 2
B 2
B 2
C 1
C 1
Any advice on how I can do this, preferably using the functions group_by and mutate in the package dplyr?
CodePudding user response:
Using group_by and mutate with dplyr, we can accomplish this by finding the row number within each group, and then using modulo to group row numbers in groups of 3:
library(dplyr)
df <- data.frame("id"=c(rep("A",5), rep("B", 6), rep("C",2))
> df %>% group_by(id) %>% mutate(Group.3 = (row_number()-1)%/%3 1)
# A tibble: 13 × 2
# Groups: id [3]
id Group.3
<chr> <dbl>
1 A 1
2 A 1
3 A 1
4 A 2
5 A 2
6 B 1
7 B 1
8 B 1
9 B 2
10 B 2
11 B 2
12 C 1
13 C 1
By the way, it can be really helpful to post a reproducible example so that the community can quickly load your data in and begin answering your question.
CodePudding user response:
An alternative is to use gl
:
library(dplyr)
df %>%
group_by(id) %>%
mutate(Group.3 = as.integer(gl(n(), 3, n())))
id Group.3
<chr> <int>
1 A 1
2 A 1
3 A 1
4 A 2
5 A 2
6 B 1
7 B 1
8 B 1
9 B 2
10 B 2
11 B 2
12 C 1
13 C 1