Home > Software design >  R keeping first/last dates within two groups
R keeping first/last dates within two groups

Time:11-19

I have a dataset similar to the below:

id  date        event
1   01/02/2016  paracetamol
1   02/03/2016  paracetamol
1   16/04/2015  ibuprofen
2   01/03/2018  paracetamol
2   10/03/2018  codeine
2   15/02/2019  codeine
2   22/02/2019  codeine

And I want to:

  1. create two columns containing the first and last instances of each event and for each person
  2. remove duplicate events within each person

Resulting in the below:

id  event        first_event  last_event
1   paracetamol  01/02/2016   02/03/2016
1   ibuprofen    16/04/2015   16/04/2015
2   paracetamol  01/03/2018   01/03/2018
2   codeine      10/03/2018   22/02/2019

Any suggestions on how to achieve this? I've been playing around functions like group_by and arrange, but can't seem to get the combination right.

CodePudding user response:

We may first convert the 'date' to Date class and get the min/max of 'date' after grouping by the columns

library(dplyr)
library(lubridate)
df1 %>%
    mutate(date = dmy(date)) %>%
    group_by(id, event) %>%
    summarise(first_event = min(date), last_event = max(date), .groups = 'drop')

-output

# A tibble: 4 × 4
     id event       first_event last_event
  <int> <chr>       <date>      <date>    
1     1 ibuprofen   2015-04-16  2015-04-16
2     1 paracetamol 2016-02-01  2016-03-02
3     2 codeine     2018-03-10  2019-02-22
4     2 paracetamol 2018-03-01  2018-03-01

data

df1 <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 2L), date = c("01/02/2016", 
"02/03/2016", "16/04/2015", "01/03/2018", "10/03/2018", "15/02/2019", 
"22/02/2019"), event = c("paracetamol", "paracetamol", "ibuprofen", 
"paracetamol", "codeine", "codeine", "codeine")),
 class = "data.frame", row.names = c(NA, 
-7L))
  • Related