Home > other >  R: sort a data frame by a numeric column of repeated numbers
R: sort a data frame by a numeric column of repeated numbers

Time:11-04

I have a data frame that looks like this:

 Gene         Cluster       
  A              0        
  B              0        
  C              7        
  D              7        
  F              1        
  G              1
  H              2
  I              3
  L              8
  M              8
  Z              8    

I would like to sort the data.frame by the numeric column "cluster" in the following order:

order = c(1,0,8,7,3,2)

With a column of factors I usually use "match" but with a numeric column of repeated numbers I don't know exactly how to sort the rows with my desired order.

Can anyone help me please?

CodePudding user response:

Your thought to use match is correct, now just include order(.) as well:

dat[order(match(dat$Cluster, c(1,0,8,7,3,2))),]
#    Gene Cluster
# 5     F       1
# 6     G       1
# 1     A       0
# 2     B       0
# 9     L       8
# 10    M       8
# 11    Z       8
# 3     C       7
# 4     D       7
# 8     I       3
# 7     H       2

FYI, according to ?order,

                The sort used is _stable_ (except for 'method =
     "quick"'), so any unresolved ties will be left in their original
     ordering.

So the code above by-itself should not change the ordering of Gene within its Clusters. If your data has more columns, you can always add tie-breakers in follow-on arguments; for instance, if you wanted to ensure that it is also sorted lexicographically by Gene within each Cluster, then you would use

order(match(dat$Cluster, c(1,0,8,7,3,2)), dat$Gene)
# or perhaps more readable
with(dat, match(Cluster, c(1,0,8,7,3,2)), Gene))
  •  Tags:  
  • r
  • Related