Suppose I have two independent vectors ``xand
y` of the same length:
x y
1 0.12
2 0.50
3 0.07
4 0.10
5 0.02
I want to sort the elements in y
in decreasing order, and sort the values in x
in a way that allows me to keep the correspondence between the two vectors, which would lead to this:
x y
2 0.50
1 0.12
4 0.10
3 0.07
5 0.02
I'm new to r
, and although I know it has a built in sort
function that allows me to sort the elements in y
, I don't know how to make it sort both. The only thing I can think of involves doing a for
cycle to "manually" sort x
by checking the original location of the elements in y
:
for(i in 1:length(ysorted)){
xsorted[i]=x[which(ysorted[i]==y)]
}
which is very ineffective.
CodePudding user response:
In dplyr
:
dat <- structure(list(x = 1:5,
y = c(0.12, 0.5, 0.07, 0.1, 0.02)),
class = "data.frame", row.names = c(NA,
-5L))
library(dplyr)
dat %>% arrange(desc(y))
x y
1 2 0.50
2 1 0.12
3 4 0.10
4 3 0.07
5 5 0.02
In data.table
:
library(data.table)
as.data.table(dat)[order(-y)]
x y
1: 2 0.50
2: 1 0.12
3: 4 0.10
4: 3 0.07
5: 5 0.02
Speed Comparison
Three solutions have already been offered in the answers, namely : base
, dplyr
, and data.table
. Similar to this case, in many cases in R programming, you can achieve the exactly same result by different approaches.
In case you need to get a comparison of the approaches based on how fast each approach is executed in R, you can use microbenchmark
from {microbenchmark}
package (again, there are also some other ways to do this).
Here is an example. In this example each approach is run 1000 times, and then the summaries of the required time are reported.
microbenchmark(
base_order = dat[order(-dat$y),],
dplyr_order = dat %>% arrange(desc(y)),
dt_order = as.data.table(dat)[order(-y)],
times = 1000
)
#Unit: microseconds
# expr min lq mean median uq max neval
# base_order 42.0 63.25 97.2585 79.45 100.35 6761.8 1000
# dplyr_order 1244.5 1503.45 1996.4406 1689.85 2065.30 16868.4 1000
# dt_order 261.3 395.85 583.9086 487.35 587.70 39294.6 1000
The results show that, for your case, base_order
is the fastest. It executed the column ordering about 20 times faster than dplyr_order
did, and about 6 times faster than dt_order
did.
CodePudding user response:
We can use order
in base R
df2 <- df1[order(-df1$y),]
-output
df2
x y
2 2 0.50
1 1 0.12
4 4 0.10
3 3 0.07
5 5 0.02
data
df1 <- structure(list(x = 1:5, y = c(0.12, 0.5, 0.07, 0.1, 0.02)),
class = "data.frame", row.names = c(NA,
-5L))