R: sort a data frame by a numeric column of repeated numbers-CodePudding

I have a data frame that looks like this:

 Gene         Cluster       
  A              0        
  B              0        
  C              7        
  D              7        
  F              1        
  G              1
  H              2
  I              3
  L              8
  M              8
  Z              8

I would like to sort the data.frame by the numeric column "cluster" in the following order:

order = c(1,0,8,7,3,2)

With a column of factors I usually use "match" but with a numeric column of repeated numbers I don't know exactly how to sort the rows with my desired order.

Can anyone help me please?

CodePudding user response：

Your thought to use match is correct, now just include order(.) as well:

dat[order(match(dat$Cluster, c(1,0,8,7,3,2))),]
#    Gene Cluster
# 5     F       1
# 6     G       1
# 1     A       0
# 2     B       0
# 9     L       8
# 10    M       8
# 11    Z       8
# 3     C       7
# 4     D       7
# 8     I       3
# 7     H       2

FYI, according to ?order,

                The sort used is _stable_ (except for 'method =
     "quick"'), so any unresolved ties will be left in their original
     ordering.

So the code above by-itself should not change the ordering of Gene within its Clusters. If your data has more columns, you can always add tie-breakers in follow-on arguments; for instance, if you wanted to ensure that it is also sorted lexicographically by Gene within each Cluster, then you would use

order(match(dat$Cluster, c(1,0,8,7,3,2)), dat$Gene)
# or perhaps more readable
with(dat, match(Cluster, c(1,0,8,7,3,2)), Gene))