I can't quite get the syntax right for this. I have a data.table
where I would like to sort first by a grouping column g1
(ordered factor), then in descending order by another column n
. The only catch is that I would like rows labeled "other" for a third column g2
to appear at the bottom of each group, regardless of their value of n
.
Example:
library(data.table)
dt <- data.table(g1 = factor(rep(c('Australia', 'Mexico', 'Canada'), 3), levels = c('Australia', 'Canada', 'Mexico')),
g2 = rep(c('stuff', 'things', 'other'), each = 3),
n = c(1000, 2000, 3000, 5000, 100, 3500, 10000, 10000, 0))
This is the expected output, where within each g1
, we have descending order of n
except that rows where g2 == 'other'
are always at the bottom:
g1 g2 n
1: Australia things 5000
2: Australia stuff 1000
3: Australia other 10000
4: Canada things 3500
5: Canada stuff 3000
6: Canada other 0
7: Mexico stuff 2000
8: Mexico things 100
9: Mexico other 10000
CodePudding user response:
Take advantage of data.table::order
and its -
-reverse ordering:
dt[order(g1, g2 == "other", -n), ]
# g1 g2 n
# <fctr> <char> <num>
# 1: Australia things 5000
# 2: Australia stuff 1000
# 3: Australia other 10000
# 4: Canada things 3500
# 5: Canada stuff 3000
# 6: Canada other 0
# 7: Mexico stuff 2000
# 8: Mexico things 100
# 9: Mexico other 10000
We add g2 == "other"
because you said that "other" should always be last. If, for example, "stuff"
was "abc"
, then we can see the difference in behavior:
dt[ g2 == "stuff", g2 := "abc" ]
dt[order(g1, -n), ]
# g1 g2 n
# <fctr> <char> <num>
# 1: Australia other 10000
# 2: Australia things 5000
# 3: Australia abc 1000
# 4: Canada things 3500
# 5: Canada abc 3000
# 6: Canada other 0
# 7: Mexico other 10000
# 8: Mexico abc 2000
# 9: Mexico things 100
dt[order(g1, g2 == "other", -g2), ]
# g1 g2 n
# <fctr> <char> <num>
# 1: Australia things 5000
# 2: Australia abc 1000
# 3: Australia other 10000
# 4: Canada things 3500
# 5: Canada abc 3000
# 6: Canada other 0
# 7: Mexico things 100
# 8: Mexico abc 2000
# 9: Mexico other 10000
One disadvantage of this is that setorder
doesn't work directly:
setorder(dt, g1, g2 == "other", -n)
# Error in setorderv(x, cols, order, na.last) :
# some columns are not in the data.table: ==,other
so we instead need to reorder and reassign back to dt
.
BTW: this works because g2 == "other"
resolves to logical
, yes, but in sorting those are treated as 0
(false) and 1
(true), ergo false-conditions will appear before true-conditions.