Home > Net >  Python Equivalent for R's order function
Python Equivalent for R's order function

Time:11-22

According to this post np.argsort() would be the function I am looking for.

However, this is not giving me my desire result.

Below is the R code that I am trying to convert to Python and my current Python code.

R Code

data.frame %>% select(order(colnames(.)))

Python Code

dataframe.iloc[numpy.array(dataframe.columns).argsort()]

The dataframe I am working with is 1,000,000 rows and 42 columns, so I can not exactly re-create the output.

But I believe I can re-create the order() outputs.
From my understanding each number represents the original position in the columns list

order(colnames(data.frame)) returns
3,2,5,6,8,4,7,10,9,11,12,13,14,15,16,17,18,19,23,20,21,22,1,25,26,28,24,27,38,29,34,33,36,30,31,32,35,41,42,39,40,37

numpy.array(dataframe.columns).argsort() returns
2,4,5,7,3,6,9,8,10,11,12,13,14,15,16,17,18,22,19,20,21,0,24,25,27,23,26,37,28,33,32,35,29,30,31,34,40,41,38,39,36,1

I know R does not have 0 index like python, so I know the first two numbers 3 and 2 are the same.

I am looking for python code that could potentially return the same ordering at the R code.

CodePudding user response:

Do you have mixed case? This is handled differently in python and R.

R:

order(c('a', 'b', 'B', 'A', 'c'))
# [1] 1 4 2 3 5

x <- c('a', 'b', 'B', 'A', 'c')
x[order(c('a', 'b', 'B', 'A', 'c'))]
# [1] "a" "A" "b" "B" "c"

Python:

np.argsort(['a', 'b', 'B', 'A', 'c']) 1
# array([4, 3, 1, 2, 5])

x = np.array(['a', 'b', 'B', 'A', 'c'])
x[np.argsort(x)]
# array(['A', 'B', 'a', 'b', 'c'], dtype='<U1')

You can mimick R's behavior using numpy.lexsort and sorting by lowercase, then by the original array with swapped case:

x = np.array(['a', 'b', 'B', 'A', 'c'])
x[np.lexsort([np.char.swapcase(x), np.char.lower(x)])]
# array(['a', 'A', 'b', 'B', 'c'], dtype='<U1')

CodePudding user response:

np.argsort is the same thing as R's order.

Just experiment

> x=c(1,2,3,10,20,30,5,15,25,35)
> x
 [1]  1  2  3 10 20 30  5 15 25 35
> order(x)
 [1]  1  2  3  7  4  8  5  9  6 10
>>> x=np.array([1,2,3,10,20,30,5,15,25,35])
>>> x
array([ 1,  2,  3, 10, 20, 30,  5, 15, 25, 35])
>>> x.argsort() 1
array([ 1,  2,  3,  7,  4,  8,  5,  9,  6, 10])

1 here is just to have index starting with 1, since output of argsort are index (0-based index).

So maybe the problem comes from your columns (shot in the dark: you have 2d-arrays, and are passing lines to R and columns to python, or something like that).

But np.argsort is R's order.

  • Related