Home > OS >  Sorting strings in Python with numbers somewhere in the middle [SOLVED]
Sorting strings in Python with numbers somewhere in the middle [SOLVED]

Time:08-26

I wanted to find a way to sort strings that have numbers in them by their numerical size.

I found one way to sort strings that contain only numbers, which works well (Sorting numbers in string format with Python) but not when the string is a mix of words and numbers.

In this example I am creating the list in the order that I want, but the sorted() ruins it.

Example:

>>> s = ['Castle_Wall_25x400x100_Bottom_01', 'Castle_Wall_25x400x50_Top_02',
'Castle_Wall_25x400x10_Bottom_01',  'Castle_Wall_25x400x300_Top_01']
>>> print(sorted(s))
['Castle_Wall_25x400x100_Bottom_01', 'Castle_Wall_25x400x10_Bottom_01', 'Castle_Wall_25x400x300_Top_01', 'Castle_Wall_25x400x50_Top_02']

Expected output:

['Castle_Wall_25x400x10_Bottom_01', 'Castle_Wall_25x400x50_Top_02', 'Castle_Wall_25x400x100_Bottom_01',  'Castle_Wall_25x400x300_Top_01']

Edit: Solution!

I solved it by creating a copy of the list where all numbers are padded with zeros so they are all of equal length, then I sort the original list using this new proxy list:

import re
names = ["Castle_Wall_25x400x10_Bottom_01",   "Castle_Wall_25x400x50_Top_02", "Castle_Wall_25x400x100_Bottom_01", "Castle_Wall_25x400x300_Top_01"]
padded = []

longest = 0
for n in names:   
    digits = re.findall('\d ', n)
    for digit in digits:
        if len(digit) > longest: 
            longest = len(digit)
   
for name in names:
    digits = re.findall('\d ', name)
    split = re.split('\d ', name)
    
    padded_name = ''
    for i, s in enumerate(split):
        padded_name  = s
        if i < len(digits):
            padded_name  = digits[i].zfill(longest)
    padded.append(padded_name)

sorted_list = [x for _, x in sorted(zip(padded, names))]
for name in sorted_list:
    print(name)

CodePudding user response:

IIUC, you are trying to multiply the numbers in 10x05 - which you can do by passing a key function to sorted

def eval_result(s):
    prefix, op = s.split('_')
    num1, num2 = map(int, op.split('x'))
    return num1 * num2
sorted(s, key=eval_result)

Output

['A_10x05', 'A_10x50', 'A_10x100']

CodePudding user response:

I believe what you want is just to sort each part of the input strings separately - text parts alphabetically, numeric parts by numeric value, with no multiplications involved. If this is the case you will need a helper function:

from re import findall

s = ['A_10x5', 'Item_A_10x05x200_Base_01', 'A_10x100', 'B']

def fun(s):
    f = findall(r'\d |[A-Za-z_] ',s)
    return list(map(lambda x:int(x) if x.isdigit() else x, f))

sorted(s, key = fun)
['A_10x5', 'A_10x100', 'B', 'Item_A_10x05x200_Base_01']

CodePudding user response:

Providing each string in the list contains exactly three dimensions:

import re
from functools import cache

s = ['Asset_Castle_Wall_25x400x100_Bottom_01', 'Asset_Castle_Wall_25x400x50_Top_02',
'Asset_Castle_Wall_25x400x10_Bottom_01',  'Asset_Castle_Wall_25x400x300_Top_01']

@cache
def get_size(s):
    if len(tokens := s.split('x')) != 3:
        return 0
    first = re.findall('(\d )', tokens[0])[-1]
    last = re.findall('(\d )', tokens[-1])[0]
    return int(first) * int(tokens[1]) * int(last)

print(sorted(s, key=get_size))

Output:

['Asset_Castle_Wall_25x400x10_Bottom_01', 'Asset_Castle_Wall_25x400x50_Top_02', 'Asset_Castle_Wall_25x400x100_Bottom_01', 'Asset_Castle_Wall_25x400x300_Top_01']
  • Related