How to sort 'paired' vectors in R-CodePudding

Suppose I have two independent vectors ``xandy` of the same length:

   x         y
   1         0.12
   2         0.50
   3         0.07
   4         0.10
   5         0.02

I want to sort the elements in y in decreasing order, and sort the values in x in a way that allows me to keep the correspondence between the two vectors, which would lead to this:

   x         y
   2         0.50
   1         0.12
   4         0.10
   3         0.07
   5         0.02

I'm new to r, and although I know it has a built in sort function that allows me to sort the elements in y, I don't know how to make it sort both. The only thing I can think of involves doing a for cycle to "manually" sort x by checking the original location of the elements in y:

for(i in 1:length(ysorted)){
    xsorted[i]=x[which(ysorted[i]==y)]
}

which is very ineffective.

CodePudding user response：

In dplyr :

dat <- structure(list(x = 1:5, 
y = c(0.12, 0.5, 0.07, 0.1, 0.02)), 
class = "data.frame", row.names = c(NA, 
-5L))

library(dplyr)
dat %>% arrange(desc(y))
  x    y
1 2 0.50
2 1 0.12
3 4 0.10
4 3 0.07
5 5 0.02

In data.table :

library(data.table)
as.data.table(dat)[order(-y)]
   x    y
1: 2 0.50
2: 1 0.12
3: 4 0.10
4: 3 0.07
5: 5 0.02

Speed Comparison

Three solutions have already been offered in the answers, namely : base, dplyr, and data.table. Similar to this case, in many cases in R programming, you can achieve the exactly same result by different approaches.

In case you need to get a comparison of the approaches based on how fast each approach is executed in R, you can use microbenchmark from {microbenchmark} package (again, there are also some other ways to do this). Here is an example. In this example each approach is run 1000 times, and then the summaries of the required time are reported.

microbenchmark(
     base_order = dat[order(-dat$y),],
     dplyr_order = dat %>% arrange(desc(y)),
     dt_order = as.data.table(dat)[order(-y)], 
     times = 1000
 )

#Unit: microseconds
#        expr    min      lq      mean  median      uq     max neval
#  base_order   42.0   63.25   97.2585   79.45  100.35  6761.8  1000
# dplyr_order 1244.5 1503.45 1996.4406 1689.85 2065.30 16868.4  1000
#    dt_order  261.3  395.85  583.9086  487.35  587.70 39294.6  1000

The results show that, for your case, base_order is the fastest. It executed the column ordering about 20 times faster than dplyr_order did, and about 6 times faster than dt_order did.

CodePudding user response：

We can use order in base R

df2 <- df1[order(-df1$y),]

-output

data

df1 <- structure(list(x = 1:5, y = c(0.12, 0.5, 0.07, 0.1, 0.02)), 
class = "data.frame", row.names = c(NA, 
-5L))