Home > Enterprise >  Custom python sort
Custom python sort

Time:02-16

I have a python dataframe (assume as df) where the column having following data:

enter image description here

I need to sort this data in this order:

enter image description here

I can sort this by df.sort_values() however this will not sort the data in required format. Any help please... I am using python 3.10

CodePudding user response:

One simple solution is to create a dict like d = {'<=1x':0,'1x-5x':1,...} and then create a new column by mapping the old to new using this dict: df['new'] = df['old'].map(d), and finally sort on this new column: df.sort_values('new')

CodePudding user response:

One solution is to map each string to a tuple containing the lower and upper bounds of your range. For example, <=1x maps to (-inf, 1.0), 1x-5x maps to (1.0, 5.0), and so on, so that sort_values() can sort them for you.

You can do this using a regex:

import re
def convert_range(r):
    range = re.findall(r"(-?\d \.?\d*)x-(-?\d \.?\d*)x", r)

    if not range:
        if "<" in r:
            nums = re.findall(r"<=?(-?\d \.?\d*)x", r)
            range = ['-inf', nums[0]]
        elif ">" in r:
            nums = re.findall(r">=?(-?\d \.?\d*)x", r)
            range = [nums[0], 'inf']

    return tuple(float(r) for r in range)        

So say you have the following dataframe:

     ranges
0    5x-10x
1  50x-100x
2   20x-50x
3   10x-20x
4     >100x
5      <=1x

You'd do:

df['tup_ranges'] = df['ranges'].apply(convert_range)
df = df.sort_values(by=['tup_ranges'])

which gives:

     ranges     tup_ranges
5      <=1x    (-inf, 1.0)
0    5x-10x    (5.0, 10.0)
3   10x-20x   (10.0, 20.0)
2   20x-50x   (20.0, 50.0)
1  50x-100x  (50.0, 100.0)
4     >100x   (100.0, inf)
  • Related