I have the following dataframe:
df = data.frame(date = c("2021-11-17", "2021-11-17", "2022-05-11", "2022-05-11", "2021-11-04", "2021-11-04", "2022-06-09", "2022-06-09"),
participant_id = rep(c("sub-x", "sub-y"), each = 4))
In that way, sub-x has 4 rows associated (2 different dates and 2 rows per date) and sub-y too (the 4 last dates).
I would like to create a new column called "session" in which there would be, for each participant, either 1 or 2 depending on whether we are dealing with the rows of the first date among the 2 associated with the participant, or the rows of the second date.
It would look like this :
df = data.frame(date = c("2021-11-17", "2021-11-17", "2022-05-11", "2022-05-11", "2021-11-04", "2021-11-04", "2022-06-09", "2022-06-09"),
participant_id = rep(c("sub-x", "sub-y"), each = 4)),
session = c("1", "1", "2", "2", "1", "1", "2", "2"))
NB : this is only a sample of my data. In reality, there are 28 rows per date of session and more than 400 participants.
How can I do that?
Thanks a lot in advance!
CodePudding user response:
You can use data.table::rleid
:
library(dplyr)
df %>%
group_by(participant_id) %>%
mutate(session = data.table::rleid(date))
# A tibble: 8 × 3
# Groups: participant_id [2]
date participant_id session
<chr> <chr> <int>
1 2021-11-17 sub-x 1
2 2021-11-17 sub-x 1
3 2022-05-11 sub-x 2
4 2022-05-11 sub-x 2
5 2021-11-04 sub-y 1
6 2021-11-04 sub-y 1
7 2022-06-09 sub-y 2
8 2022-06-09 sub-y 2
CodePudding user response:
Using dplyr, group by id, then use dense_rank()
within each id:
library(dplyr)
df %>%
group_by(participant_id) %>%
mutate(session = dense_rank(date)) %>%
ungroup()
# A tibble: 8 × 3
date participant_id session
<chr> <chr> <int>
1 2021-11-17 sub-x 1
2 2021-11-17 sub-x 1
3 2022-05-11 sub-x 2
4 2022-05-11 sub-x 2
5 2021-11-04 sub-y 1
6 2021-11-04 sub-y 1
7 2022-06-09 sub-y 2
8 2022-06-09 sub-y 2