Home > OS >  Why does stringr::str_order(x, numeric = T) sort data differently in conjunction with dplyr::arrange
Why does stringr::str_order(x, numeric = T) sort data differently in conjunction with dplyr::arrange

Time:12-10

I am trying to arrange a data.frame by a text column with some numeric values in it:

foo <- data.frame(x = c("A100", "A1", "A2", "A10", "A11"))

I am trying to sort it numerically using stringr::str_order(foo$x, numeric = TRUE) or something similar. I am trying to use this with dplyr::arrange but it is not arranging correctly. Here is what I have done:

dplyr::arrange(foo, stringr::str_order(x,numeric = T))

On my machine, this returns the values in the order of A11, A100, A1, A2, A10, as opposed to A1, A2, A10, A11, A100. This code works correctly:

foo[stringr::str_order(foo$x,numeric = T),]

I would expect these to do the same thing, but they don't, at least on my machine (Windows 10, R version 4.1.0) and my brother's (Mac, R version 4.0.2).

My question is, why is the output different? What am I missing? Is there a way to make str_order and arrange to work together?

I would like to be able to sort this column using dplyr::arrange so that I do not need to track down all of the places that I used arrange.

Thank you for your thoughts and time!

CodePudding user response:

You can use:

dplyr::arrange(foo, match(x, stringr::str_sort(x,numeric = T)))

     x
1   A1
2   A2
3  A10
4  A11
5 A100

CodePudding user response:

Note that str_order just like order returns the indix each element will contain in an ascending manner eg:

str_order(foo$x,numeric = T)
[1] 2 3 4 5 1

Meaning the last element, ie the largest element currently is in position 1, while the first element, ie the smallest, is in position 2 of the current vector.

On the other hand, arrange takes in the position that the elements should be once ordered, ie the ranks(with no ties).

y <- c(100,1,2,10,11)
order(y)
[1] 2 3 4 5 1 # We do not want this
rank(y)
[1] 5 1 2 3 4 # We want this.

Note that the rank states that the smallest object(1) is in position 2 and the largest object(5) is in position 1

Now to obtain this, just order the ordered vector. Hence:

arrange(foo, order(str_order(x,numeric = T)))
     x
1   A1
2   A2
3  A10
4  A11
5 A100
  • Related