I am trying to arrange a data.frame by a text column with some numeric values in it:
foo <- data.frame(x = c("A100", "A1", "A2", "A10", "A11"))
I am trying to sort it numerically using stringr::str_order(foo$x, numeric = TRUE) or something similar. I am trying to use this with dplyr::arrange but it is not arranging correctly. Here is what I have done:
dplyr::arrange(foo, stringr::str_order(x,numeric = T))
On my machine, this returns the values in the order of A11, A100, A1, A2, A10, as opposed to A1, A2, A10, A11, A100. This code works correctly:
foo[stringr::str_order(foo$x,numeric = T),]
I would expect these to do the same thing, but they don't, at least on my machine (Windows 10, R version 4.1.0) and my brother's (Mac, R version 4.0.2).
My question is, why is the output different? What am I missing? Is there a way to make str_order and arrange to work together?
I would like to be able to sort this column using dplyr::arrange so that I do not need to track down all of the places that I used arrange.
Thank you for your thoughts and time!
CodePudding user response:
You can use:
dplyr::arrange(foo, match(x, stringr::str_sort(x,numeric = T)))
x
1 A1
2 A2
3 A10
4 A11
5 A100
CodePudding user response:
Note that str_order
just like order
returns the indix each element will contain in an ascending manner eg:
str_order(foo$x,numeric = T)
[1] 2 3 4 5 1
Meaning the last element, ie the largest element currently is in position 1, while the first element, ie the smallest, is in position 2 of the current vector.
On the other hand, arrange
takes in the position that the elements should be once ordered, ie the ranks(with no ties).
y <- c(100,1,2,10,11)
order(y)
[1] 2 3 4 5 1 # We do not want this
rank(y)
[1] 5 1 2 3 4 # We want this.
Note that the rank states that the smallest object(1) is in position 2 and the largest object(5) is in position 1
Now to obtain this, just order the ordered vector. Hence:
arrange(foo, order(str_order(x,numeric = T)))
x
1 A1
2 A2
3 A10
4 A11
5 A100