How to retain only one record but extract data from several two rows for a unique reference number-CodePudding

I am trying to remove duplicates but need to retain the first data from 1 row and another data from another row/s in [R] programming.

For example, I have data like this:

ID	Transaction #	Process #	Start Date	End Date
ID1	ID1_T1	P1	2022-01-03	2022-01-05
ID1	ID1_T1	P2	2022-01-06	2022-01-10
ID1	ID1_T1	P3	2022-01-11	2022-01-20
ID1	ID1_T1	P4	2022-01-21	2022-01-31
ID2	ID2_T1	P1	2022-01-06	2022-01-11
ID2	ID2_T1	P2	2022-01-12	2022-01-18
ID2	ID2_T1	P3	2022-01-19	2022-01-25
ID3	ID3_T1	P1	2022-01-21	2022-01-22

And I need to simplify it into this:

ID	Transaction #	Process #	Start Date	End Date
ID1	ID1_T1	P1	2022-01-03	2022-01-31
ID2	ID2_T1	P1	2022-01-06	2022-01-25
ID3	ID3_T1	P1	2022-01-11	2022-01-22

Retaining only the first Start Date and last End Date of the transaction per unique ID and Transaction #.

I'm expecting help on how to code this in R using the tidyverse library since I'm not yet familiar with Data.table.

Thanks very much.

CodePudding user response：

You can use group_by summarise:

library(dplyr)

df %>%
  group_by(ID, Transaction) %>%
  summarise(Process = first(Process),
            Start_Date = first(Start_Date), # or min(Start_Date)
            End_Date = last(End_Date),      # or max(End_Date)
            .groups = "drop")