I have table
DF <- tibble(
CLIENT = c(1,1,1,2,2,2,3,3,3),
N_DOG = c('N1','N2','N3','N4','N5','N6','N7','N8','N9'),
DT = as.Date(c('01.06.2021','12.07.2021','04.05.2021',
'03.08.2021','21.07.2021','04.01.2022',
'07.04.2021','09.11.2021','08.12.2021'), format = '%d.%m.%Y'),
RANK = c(1,1,2,3,1,1,2,2,3)
)
CLIENT | N_DOG | DT | RANK |
---|---|---|---|
1 | N1 | 2021-06-01 | 1 |
1 | N2 | 2021-07-12 | 1 |
1 | N3 | 2021-05-04 | 2 |
2 | N4 | 2021-08-03 | 3 |
2 | N5 | 2021-07-21 | 1 |
2 | N6 | 2022-01-04 | 1 |
3 | N7 | 2021-04-07 | 2 |
3 | N8 | 2021-11-09 | 2 |
3 | N9 | 2021-12-08 | 3 |
I want to find a contract for each client according to the following criteria:
- Find a contract with a minimum rank
- If there are several such contracts, we take the last one.
PS. The output must be a unique client and 1 contract
Realization :
MIN_RANK <- DF %>%
select(CLIENT, RANK) %>%
group_by(CLIENT) %>%
filter(RANK == min(RANK)) %>%
ungroup() %>%
distinct()
MIN_RANK %>%
inner_join(DF, by = c('CLIENT','RANK')) %>%
select(!c(N_DOG,RANK)) %>%
group_by(CLIENT) %>%
filter(DT == max(DT)) %>%
inner_join(DF, by = c('CLIENT','DT'))
Output
CLIENT | DT | N_DOG | RANK |
---|---|---|---|
1 | 2021-07-12 | N2 | 1 |
2 | 2022-01-04 | N6 | 1 |
3 | 2021-11-09 | N8 | 2 |
How can I do it faster ? [ maybe there's some function ? ] the same question at the ru forums.
English is not my native language, so please be kind to my mistakes.
CodePudding user response:
arrange in the desired order and select the first of each client?
arrange(DF, CLIENT, RANK, desc(row_number())) |>
filter(!duplicated(CLIENT))
If there are many additional columns in DF
, it might pay to do the operations just on the CLIENT, RANK, and row number, e.g.,
row_number <- seq_len(nrow(DF))
o <- order(DF$CLIENT, DF$RANK, rev(row_number))
i <- !duplicated(DF$CLIENT[o])
## place the row_number in the correct order and subset to include
## non-duplicated CLIENT; return these rows
DF[row_number[o][i],]
CodePudding user response:
data.table::setDT(DF)[order(RANK,-DT), .SD[1], by=CLIENT]
Output:
CLIENT N_DOG DT RANK
<num> <char> <Date> <num>
1: 2 N6 2022-01-04 1
2: 1 N2 2021-07-12 1
3: 3 N8 2021-11-09 2