How can I return the second max by group in R data table with the id column returned as well?
Reproducible example below:
library(data.table)
sample_dt <- data.table(
myid=c("A","B","B","A","B","B","C","C","A","A"),
mycol1 =c(101,-108,140,140,-150,-140,-111,101,150,140)
)
#desired output
desired_dt <- data.table(
myid=c("A","B","C"),
mycol1_secondmax=c(140,-108,-111)
)
CodePudding user response:
A possible solution:
sample_dt[order(-mycol1),.SD[2],by=myid]
myid mycol1
<char> <num>
1: A 140
2: B -108
3: C -111
EDIT: comparison with frank
as suggested by @r2evans:
sample_dt <- data.table(
myid=sample(LETTERS,10^4,replace=T),
mycol1 =sample(-100:100,10^4,replace=T)
)
microbenchmark::microbenchmark(
sample_dt[order(-mycol1),.SD[2],by=myid],
sample_dt[, .SD[frank(-mycol1, ties.method = "first") == 2L,], by = .(myid)],
sample_dt[, .SD[order(-mycol1) == 2], myid]
)
Unit: milliseconds
expr min lq mean median uq max neval
sample_dt[order(-mycol1), .SD[2], by = myid] 1.450001 1.850001 3.139760 3.145701 3.899852 13.6538 100
sample_dt[, .SD[frank(-mycol1, ties.method = "first") == 2L, ], by = .(myid)] 6.816001 7.199701 8.289604 7.638501 8.390902 22.0289 100
sample_dt[, .SD[order(-mycol1) == 2], myid] 4.503801 4.824001 5.606895 5.229701 5.872300 15.9330 100
CodePudding user response:
Another data.table
option
> sample_dt[, .SD[order(-mycol1) == 2], myid]
myid mycol1
1: A 140
2: B -108
3: C -111
CodePudding user response:
3rd option using setorder
and later extract, i.e. `[`
as part of a helper function:
setorder(sample_dt, myid, -mycol1)
fun <- function(x) {
if(length(x) == 1) x else x[2]
}
Result
sample_dt[, fun(mycol1), by = myid]
# myid V1
# 1: A 140
# 2: B -108
# 3: C -111