I have a data frame that looks like this:
Gene Cluster A 0 B 0 C 7 D 7 F 1 G 1 H 2 I 3 L 8 M 8 Z 8
I would like to sort the data.frame by the numeric column "cluster" in the following order:
order = c(1,0,8,7,3,2)
With a column of factors I usually use "match" but with a numeric column of repeated numbers I don't know exactly how to sort the rows with my desired order.
Can anyone help me please?
CodePudding user response:
Your thought to use match
is correct, now just include order(.)
as well:
dat[order(match(dat$Cluster, c(1,0,8,7,3,2))),]
# Gene Cluster
# 5 F 1
# 6 G 1
# 1 A 0
# 2 B 0
# 9 L 8
# 10 M 8
# 11 Z 8
# 3 C 7
# 4 D 7
# 8 I 3
# 7 H 2
FYI, according to ?order
,
The sort used is _stable_ (except for 'method =
"quick"'), so any unresolved ties will be left in their original
ordering.
So the code above by-itself should not change the ordering of Gene
within its Cluster
s. If your data has more columns, you can always add tie-breakers in follow-on arguments; for instance, if you wanted to ensure that it is also sorted lexicographically by Gene
within each Cluster
, then you would use
order(match(dat$Cluster, c(1,0,8,7,3,2)), dat$Gene)
# or perhaps more readable
with(dat, match(Cluster, c(1,0,8,7,3,2)), Gene))