I have a python dataframe (assume as df) where the column having following data:
I need to sort this data in this order:
I can sort this by df.sort_values() however this will not sort the data in required format. Any help please... I am using python 3.10
CodePudding user response:
One simple solution is to create a dict like d = {'<=1x':0,'1x-5x':1,...}
and then create a new column by mapping the old to new using this dict: df['new'] = df['old'].map(d)
, and finally sort on this new column: df.sort_values('new')
CodePudding user response:
One solution is to map each string to a tuple containing the lower and upper bounds of your range. For example, <=1x
maps to (-inf, 1.0)
, 1x-5x
maps to (1.0, 5.0)
, and so on, so that sort_values()
can sort them for you.
You can do this using a regex:
import re
def convert_range(r):
range = re.findall(r"(-?\d \.?\d*)x-(-?\d \.?\d*)x", r)
if not range:
if "<" in r:
nums = re.findall(r"<=?(-?\d \.?\d*)x", r)
range = ['-inf', nums[0]]
elif ">" in r:
nums = re.findall(r">=?(-?\d \.?\d*)x", r)
range = [nums[0], 'inf']
return tuple(float(r) for r in range)
So say you have the following dataframe:
ranges
0 5x-10x
1 50x-100x
2 20x-50x
3 10x-20x
4 >100x
5 <=1x
You'd do:
df['tup_ranges'] = df['ranges'].apply(convert_range)
df = df.sort_values(by=['tup_ranges'])
which gives:
ranges tup_ranges
5 <=1x (-inf, 1.0)
0 5x-10x (5.0, 10.0)
3 10x-20x (10.0, 20.0)
2 20x-50x (20.0, 50.0)
1 50x-100x (50.0, 100.0)
4 >100x (100.0, inf)