In a large data.table
, I would like to compute the percentile of a numeric variable with respect to a date
variable, and with respect to a subgroup of a group
variable.
Example: In the table below I compute each value
's percentile with respect to date
and all members of the group
-variable:
library(data.table)
set.seed(101)
dt <- data.table(group = c(rep("A",3),rep("B",3),rep("C",3), rep("D",3)),
date = rep(as.Date(19000:19002, origin = "1970-01-01"),4),
value=rnorm(12))
dt[, Percentile := ecdf(value)(value), date]
> dt
group date value Percentile
1: A 2022-01-08 -0.3260365 0.25
2: A 2022-01-09 0.5524619 1.00
3: A 2022-01-10 -0.6749438 0.50
4: B 2022-01-08 0.2143595 0.75
5: B 2022-01-09 0.3107692 0.50
6: B 2022-01-10 1.1739663 1.00
7: C 2022-01-08 0.6187899 1.00
8: C 2022-01-09 -0.1127343 0.25
9: C 2022-01-10 0.9170283 0.75
10: D 2022-01-08 -0.2232594 0.50
11: D 2022-01-09 0.5264481 0.75
12: D 2022-01-10 -0.7948444 0.25
Is there a way to compute the percentile of each value
with respect to date
, but the ecdf()
-function only being fitted to group members, e.g. c("A","B","C")
?
The result should look like this:
> dt
group date value Percentile
1: A 2022-01-08 -0.3260365 0.3333333
2: A 2022-01-09 0.5524619 1.0000000
3: A 2022-01-10 -0.6749438 0.3333333
4: B 2022-01-08 0.2143595 0.6666667
5: B 2022-01-09 0.3107692 0.6666667
6: B 2022-01-10 1.1739663 1.0000000
7: C 2022-01-08 0.6187899 1.0000000
8: C 2022-01-09 -0.1127343 0.3333333
9: C 2022-01-10 0.9170283 0.6666667
10: D 2022-01-08 -0.2232594 0.3333333
11: D 2022-01-09 0.5264481 0.6666667
12: D 2022-01-10 -0.7948444 0.0000000
Thank you!
CodePudding user response:
We could use
library(data.table)
dt[, Percentile := ecdf(value[group %in% c("A", "B", "C")])(value), date]
-output
> dt
group date value Percentile
<char> <Date> <num> <num>
1: A 2022-01-08 -0.3260365 0.3333333
2: A 2022-01-09 0.5524619 1.0000000
3: A 2022-01-10 -0.6749438 0.3333333
4: B 2022-01-08 0.2143595 0.6666667
5: B 2022-01-09 0.3107692 0.6666667
6: B 2022-01-10 1.1739663 1.0000000
7: C 2022-01-08 0.6187899 1.0000000
8: C 2022-01-09 -0.1127343 0.3333333
9: C 2022-01-10 0.9170283 0.6666667
10: D 2022-01-08 -0.2232594 0.3333333
11: D 2022-01-09 0.5264481 0.6666667
12: D 2022-01-10 -0.7948444 0.0000000