I have a dataset similar to the below:
id date event
1 01/02/2016 paracetamol
1 02/03/2016 paracetamol
1 16/04/2015 ibuprofen
2 01/03/2018 paracetamol
2 10/03/2018 codeine
2 15/02/2019 codeine
2 22/02/2019 codeine
And I want to:
- create two columns containing the first and last instances of each event and for each person
- remove duplicate events within each person
Resulting in the below:
id event first_event last_event
1 paracetamol 01/02/2016 02/03/2016
1 ibuprofen 16/04/2015 16/04/2015
2 paracetamol 01/03/2018 01/03/2018
2 codeine 10/03/2018 22/02/2019
Any suggestions on how to achieve this? I've been playing around functions like group_by
and arrange
, but can't seem to get the combination right.
CodePudding user response:
We may first convert the 'date' to Date
class and get the min/max
of 'date' after grouping by the columns
library(dplyr)
library(lubridate)
df1 %>%
mutate(date = dmy(date)) %>%
group_by(id, event) %>%
summarise(first_event = min(date), last_event = max(date), .groups = 'drop')
-output
# A tibble: 4 × 4
id event first_event last_event
<int> <chr> <date> <date>
1 1 ibuprofen 2015-04-16 2015-04-16
2 1 paracetamol 2016-02-01 2016-03-02
3 2 codeine 2018-03-10 2019-02-22
4 2 paracetamol 2018-03-01 2018-03-01
data
df1 <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 2L), date = c("01/02/2016",
"02/03/2016", "16/04/2015", "01/03/2018", "10/03/2018", "15/02/2019",
"22/02/2019"), event = c("paracetamol", "paracetamol", "ibuprofen",
"paracetamol", "codeine", "codeine", "codeine")),
class = "data.frame", row.names = c(NA,
-7L))