Home > Back-end >  How to get the (relative) place of values in a dataframe when sorted using Python?
How to get the (relative) place of values in a dataframe when sorted using Python?

Time:02-12

How can I create a Pandas DataFrame that shows the relative position of each value, when those values are sorted from low to high for each column? So in this case, how can you transform 'df' into 'dfOut'?

import pandas as pd
import numpy as np

#create DataFrame
df = pd.DataFrame({'A': [12, 18, 9, 21, 24, 15],
                   'B': [18, 22, 19, 14, 14, 11],
                   'C': [5, 7, 7, 9, 12, 9]})

# How to assign a value to the order in the column, when sorted from low to high?

dfOut = pd.DataFrame({'A': [2, 4, 1, 5, 6, 3],
                      'B': [3, 5, 4, 2, 2, 1],
                      'C': [1, 2, 2, 3, 4, 3]})

CodePudding user response:

Here is my attempt using some functions:

def sorted_idx(l, num):
    x = sorted(list(set(l)))
    for i in range(len(x)):
        if x[i]==num:
            return i 1

def output_list(l):
    ret = [sorted_idx(l, elem) for elem in l]
    return ret

dfOut = df.apply(lambda column: output_list(column))

print(dfOut)

I make reduce the original list to unique values and then sort. Finally, I return the index 1 where the element in the original list matches this unique, sorted list to get the values you have in your expected output.

Output:

   A  B  C
0  2  3  1
1  4  5  2
2  1  4  2
3  5  2  3
4  6  2  4
5  3  1  3

CodePudding user response:

If you need to map the same values to the same output, try using the rank method of a DataFrame. Like this:

>> dfOut = df.rank(method="dense").astype(int)  # Type transformation added to match your output
>> dfOut
   A  B  C
0  2  3  1
1  4  5  2
2  1  4  2
3  5  2  3
4  6  2  4
5  3  1  3

The rank method computes the rank for each column following a specific criteria. According to the Pandas documentation, the "dense" method ensures that "rank always increases by 1 between groups", and that might match your use case.

Original answer: In case that repeated numbers are not required to map to the same out value, np.argsort could be applied on each column to retrieve the position of each value that would sort the column. Combine this with the apply method of a DataFrame to apply the function on each column and you have this:

>> dfOut = df.apply(lambda column: np.argsort(column.values)))
>> dfOut
   A  B  C
0  2  5  0
1  0  3  1
2  5  4  2
3  1  0  3
4  3  2  5
5  4  1  4
  • Related