I have a data.table:
example <- data.table(year = c(2016, 2017, 2018, 2016, 2017, 2020),
ID = c("A","A","A", "B", "B","B"))
and I need an order variable for the years. The highest year having the smallest rank, for every id
separately. The result would then be:
exmampleResult <- data.table(year = c(2016, 2017, 2018, 2016, 2017, 2020),
ID = c("A","A","A", "B", "B","B"),
yearRank = c(3, 2, 1, 5, 4, 1))
How can this be done in data.table?
I tried splitting the data table in a list of list:
exampleList <- lapply(split(example,example$ID), function(x) as.list(x))
and then calculating the order using another apply. However that seems too complicated. Is there a simpler way?
CodePudding user response:
Grouped by 'ID', get the sequence (:
) from min
imum of 'year' to max
imum of 'year', rank it (frank
) and use match
to get the index to subset the rank
library(data.table)
example[, yearRank :=
{yr <- (min(year):max(year))
frank(-yr)[match(year, yr)]
},ID]
-output
> example
year ID yearRank
<num> <char> <num>
1: 2016 A 3
2: 2017 A 2
3: 2018 A 1
4: 2016 B 5
5: 2017 B 4
6: 2020 B 1
CodePudding user response:
Here's another option leveraging the tidyverse
library(data.table)
library(tidyverse)
example <- data.table(year = c(2016, 2017, 2018, 2016, 2017, 2020),
ID = c("A","A","A", "B", "B","B"))
exampleResult <- example %>%
group_by(ID) %>%
mutate(yearRank = rank(desc(year))) %>%
ungroup() %>%
data.table()