Home > Software design >  create grouped "order" variable in data.table in R
create grouped "order" variable in data.table in R

Time:06-16

I have a data.table:

example <- data.table(year = c(2016, 2017, 2018, 2016, 2017, 2020), 
           ID = c("A","A","A", "B", "B","B"))

and I need an order variable for the years. The highest year having the smallest rank, for every id separately. The result would then be:

exmampleResult <- data.table(year = c(2016, 2017, 2018, 2016, 2017, 2020), 
                             ID = c("A","A","A", "B", "B","B"), 
                             yearRank = c(3, 2, 1, 5, 4, 1))

How can this be done in data.table?

I tried splitting the data table in a list of list:

exampleList <- lapply(split(example,example$ID), function(x) as.list(x))

and then calculating the order using another apply. However that seems too complicated. Is there a simpler way?

CodePudding user response:

Grouped by 'ID', get the sequence (:) from minimum of 'year' to maximum of 'year', rank it (frank) and use match to get the index to subset the rank

library(data.table)
example[,  yearRank := 
      {yr <- (min(year):max(year))
      frank(-yr)[match(year, yr)]
     },ID]

-output

> example
    year     ID yearRank
   <num> <char>    <num>
1:  2016      A        3
2:  2017      A        2
3:  2018      A        1
4:  2016      B        5
5:  2017      B        4
6:  2020      B        1

CodePudding user response:

Here's another option leveraging the tidyverse

library(data.table)
library(tidyverse)

example <- data.table(year = c(2016, 2017, 2018, 2016, 2017, 2020), 
                      ID = c("A","A","A", "B", "B","B"))

exampleResult <- example %>%
  group_by(ID) %>%
  mutate(yearRank = rank(desc(year))) %>%
  ungroup() %>%
  data.table()
  • Related