Home > other >  Window function in R
Window function in R

Time:02-24

I have table

DF <- tibble(
  CLIENT = c(1,1,1,2,2,2,3,3,3),
  N_DOG = c('N1','N2','N3','N4','N5','N6','N7','N8','N9'),
  DT = as.Date(c('01.06.2021','12.07.2021','04.05.2021',
                 '03.08.2021','21.07.2021','04.01.2022',
                 '07.04.2021','09.11.2021','08.12.2021'), format = '%d.%m.%Y'),
  RANK = c(1,1,2,3,1,1,2,2,3)
)
CLIENT N_DOG DT RANK
1 N1 2021-06-01 1
1 N2 2021-07-12 1
1 N3 2021-05-04 2
2 N4 2021-08-03 3
2 N5 2021-07-21 1
2 N6 2022-01-04 1
3 N7 2021-04-07 2
3 N8 2021-11-09 2
3 N9 2021-12-08 3

I want to find a contract for each client according to the following criteria:

  1. Find a contract with a minimum rank
  2. If there are several such contracts, we take the last one.

PS. The output must be a unique client and 1 contract

Realization :

MIN_RANK <- DF %>% 
  select(CLIENT, RANK) %>% 
  group_by(CLIENT) %>% 
  filter(RANK == min(RANK)) %>% 
  ungroup() %>% 
  distinct()

MIN_RANK %>% 
  inner_join(DF, by = c('CLIENT','RANK')) %>% 
  select(!c(N_DOG,RANK)) %>% 
  group_by(CLIENT) %>% 
  filter(DT == max(DT)) %>% 
  inner_join(DF, by = c('CLIENT','DT'))

Output

CLIENT DT N_DOG RANK
1 2021-07-12 N2 1
2 2022-01-04 N6 1
3 2021-11-09 N8 2

How can I do it faster ? [ maybe there's some function ? ] the same question at the ru forums.

English is not my native language, so please be kind to my mistakes.

CodePudding user response:

arrange in the desired order and select the first of each client?

arrange(DF, CLIENT, RANK, desc(row_number())) |>
    filter(!duplicated(CLIENT))

If there are many additional columns in DF, it might pay to do the operations just on the CLIENT, RANK, and row number, e.g.,

row_number <- seq_len(nrow(DF))
o <- order(DF$CLIENT, DF$RANK, rev(row_number))
i <- !duplicated(DF$CLIENT[o])
## place the row_number in the correct order and subset to include
## non-duplicated CLIENT; return these rows
DF[row_number[o][i],]

CodePudding user response:

data.table::setDT(DF)[order(RANK,-DT), .SD[1], by=CLIENT]

Output:

   CLIENT  N_DOG         DT  RANK
    <num> <char>     <Date> <num>
1:      2     N6 2022-01-04     1
2:      1     N2 2021-07-12     1
3:      3     N8 2021-11-09     2
  •  Tags:  
  • r
  • Related