Home > Back-end >  Sorting with order in R with the whole data frame
Sorting with order in R with the whole data frame

Time:12-07

I have a data frame that I'd like to order based on a vector of IDs and on the all the columns of another data frame.

id.namestest = data.frame(test = NA, id= c("id1", "id2", "id3","id3", "id2", "id1"))

head(admix)
#             V1        V2           V3
# [1,] 0.1019623 0.8961855 1.852222e-03
# [2,] 0.6891593 0.3107807 5.999776e-05
# [3,] 0.7274040 0.2697308 2.865165e-03
# [4,] 0.3458368 0.6514100 2.753215e-03
# [5,] 0.3946996 0.6053004 1.000000e-09
# [6,] 0.6383386 0.3585409 3.120463e-03

admix=structure(c(0.101962262250848, 0.68915927427333, 0.727404046114676, 
            0.345836796905855, 0.394699646563406, 0.638338623952938, 0.896185515801946, 
            0.310780727965854, 0.26973078933548, 0.65140998802539, 0.605300352436594, 
            0.358540912890725, 0.00185222194720621, 5.99977608165462e-05, 
            0.00286516454984352, 0.00275321506875506, 1e-09, 0.00312046315633649
), dim = c(6L, 3L), dimnames = list(NULL, c("V1", "V2", "V3")))

This below works, but I have to manually set the column order in admix:

admix.tmp = cbind(admix, id.namestest)
if (K==3) { admix.sort.tmp = admix.tmp[order(id.namestest[,2], admix[,1],admix[,2],admix[,3]),]}

I'd like to instead provide a vector of the order of columns sort.order

sort.order = c(1,2,3)

admix.sort.tmp = admix.tmp[order(id.namestest[,2], admix[,sort.order]),]

But I get this:

Error in order(id.namestest[, 2], admix[, c(1, 2, 3)]) : 
  argument lengths differ

I also tried:

admix.sort.tmp = admix.tmp[order(id.namestest[,2], asplit(admix, 2)),]

but I get the same error.

CodePudding user response:

As showed in the error, the id.namestest[,2] is a vector with length 5, whereas the admix[, 1, 2, 3] is a matrix and its length will the length of the number of elements in the matrix. We can create a list and then use order with do.call

admix.tmp[do.call(order, c(list(id.namestest[,2]), asplit(admix, 2))),]

-output

         V1        V2           V3 test  id
1 0.1019623 0.8961855 1.852222e-03   NA id1
6 0.6383386 0.3585409 3.120463e-03   NA id1
5 0.3946996 0.6053004 1.000000e-09   NA id2
2 0.6891593 0.3107807 5.999776e-05   NA id2
4 0.3458368 0.6514100 2.753215e-03   NA id3
3 0.7274040 0.2697308 2.865165e-03   NA id3

By creating a list of vectors or a data.frame, the types of columns are intact

admix.tmp[do.call(order, cbind(id.namestest[2], admix)),]
         V1        V2           V3 test  id
1 0.1019623 0.8961855 1.852222e-03   NA id1
6 0.6383386 0.3585409 3.120463e-03   NA id1
5 0.3946996 0.6053004 1.000000e-09   NA id2
2 0.6891593 0.3107807 5.999776e-05   NA id2
4 0.3458368 0.6514100 2.753215e-03   NA id3
3 0.7274040 0.2697308 2.865165e-03   NA id3

Or using dplyr

library(dplyr)
admix.tmp %>%
   arrange(id, across(all_of(colnames(admix[, sort.order, drop = FALSE]))))
  • Related