I get different results if I use order()
in data.frame
and data.table
. For example:
A <- data.frame(one=c("k"),two=c("3_28","31_60","48_68"))
B <- as.data.table(A)
A[order(A$one,A$two),]
one two
1 k 3_28
2 k 31_60
3 k 48_68
B[order(B$one, B$two),]
one two
1: k 31_60
2: k 3_28
3: k 48_68
I must admit this was a bit of a nasty shock, as I have assumed equivalent results for order()
from data.frame
and data.table
for many years. I guess there is a lot of code I need to check!
Is there any way to ensure order()
gives the same results in data.frame
and data.table
?
Many apologies if this difference in behavior is already well known, and is just an example of my ignorance.
CodePudding user response:
When used inside of a data.table
operation, order(..)
uses data.table:::forder
. According to the Introduction to data.table vignette:
order() is internally optimised
We can use "-" on a
character
columns within the frame of adata.table
to sort in decreasing order.In addition,
order(...)
within the frame of adata.table
usesdata.table
's internal fast radix orderforder()
. This sort provided such a compelling improvement over R'sbase::order
that the R project adopted thedata.table
algorithm as its default sort in 2016 for R 3.3.0, see?sort
and the R Release NEWS.
The key to see the difference is that it uses a "fast radix order". If you see base::order
, though, it has an argument method=
which
method: the method to be used: partial matches are allowed. The
default ('"auto"') implies '"radix"' for short numeric
vectors, integer vectors, logical vectors and factors.
Otherwise, it implies '"shell"'. For details of methods
'"shell"', '"quick"', and '"radix"', see the help for 'sort'.
Since the second column of your data.table
is not one of numeric
, integer
, logical
, or factor
, then base::order
uses the "shell"
method for sorting, which produces different results.
However, if we force base::order
to use method="radix"
, we get the same result.
order(A$two)
# [1] 1 2 3
order(A$two, method="radix")
# [1] 2 1 3
A[order(A$one, A$two, method = "radix"),]
# one two
# 2 k 31_60
# 1 k 3_28
# 3 k 48_68
You can affect the same ordering by using base::order
:
B[base::order(B$one,B$two),]
# one two
# <char> <char>
# 1: k 3_28
# 2: k 31_60
# 3: k 48_68