Home > database >  How to sort a list of strings delimited by '.' with also numbers in the middle?
How to sort a list of strings delimited by '.' with also numbers in the middle?

Time:11-08

I have a list of strings that contain commands separated by a dot . like this:

DeviceA.CommandA.1.Hello,
DeviceA.CommandA.2.Hello,
DeviceA.CommandA.11.Hello,
DeviceA.CommandA.3.Hello,
DeviceA.CommandB.1.Hello,
DeviceA.CommandB.1.Bye,
DeviceB.CommandB.What,
DeviceA.SubdeviceA.CommandB.1.Hello,
DeviceA.SubdeviceA.CommandB.2.Hello,
DeviceA.SubdeviceB.CommandA.1.What

And I would want to order them in natural order:

  1. The order must prioritize by field index (e.g The commands that start with DeviceA will always go before DeviceB etc)
  2. Order alphabetically the strings
  3. When it finds a number sort numerically in ascending order

Therefore, the sorted output should be:

DeviceA.CommandA.1.Hello,
DeviceA.CommandA.2.Hello,
DeviceA.CommandA.3.Hello,
DeviceA.CommandA.11.Hello,
DeviceA.CommandB.1.Bye,
DeviceA.CommandB.1.Hello,
DeviceA.SubdeviceA.CommandB.1.Hello,
DeviceA.SubdeviceA.CommandB.2.Hello,
DeviceA.SubdeviceB.CommandA.What,
DeviceB.CommandB.What

Also note that the length of the command fields is dynamic, the number of fields separated by dot can be any size.

So far I tried this without luck (the numbers are order alphabetically, for example 11 goes before 5):

list = [
    "DeviceA.CommandA.1.Hello",
    "DeviceA.CommandA.2.Hello",
    "DeviceA.CommandA.11.Hello",
    "DeviceA.CommandA.3.Hello",
    "DeviceA.CommandB.1.Hello",
    "DeviceA.CommandB.1.Bye",
    "DeviceB.CommandB.What",
    "DeviceA.SubdeviceA.CommandB.1.Hello",
    "DeviceA.SubdeviceA.CommandB.2.Hello",
    "DeviceA.SubdeviceB.CommandA.1.What"
]

sorted_list = sorted(list, key=lambda x: x.split('.')) 

EDIT: Corrected typo error.

CodePudding user response:

Something like this should get you going.

from pprint import pprint

data_list = [
    "DeviceA.CommandA.1.Hello",
    "DeviceA.CommandA.2.Hello",
    "DeviceA.CommandA.3.Hello",
    "DeviceA.CommandB.1.Hello",
    "DeviceA.CommandB.1.Bye",
    "DeviceB.CommandB.What",
    "DeviceA.SubdeviceA.CommandB.1.Hello",
    "DeviceA.SubdeviceA.CommandB.15.Hello",  # added test case to ensure numbers are sorted numerically
    "DeviceA.SubdeviceA.CommandB.2.Hello",
    "DeviceA.SubdeviceB.CommandA.1.What",
]


def get_sort_key(s):
    # Turning the pieces to integers would fail some comparisons (1 vs "What")
    # so instead pad them on the left to a suitably long string
    return [
        bit.rjust(30, "0") if bit.isdigit() else bit
        for bit in s.split(".")
    ]


# Note the key function must be passed as a kwarg.
sorted_list = sorted(data_list, key=get_sort_key)

pprint(sorted_list)

The output is

['DeviceA.CommandA.1.Hello',
 'DeviceA.CommandA.2.Hello',
 'DeviceA.CommandA.3.Hello',
 'DeviceA.CommandB.1.Bye',
 'DeviceA.CommandB.1.Hello',
 'DeviceA.SubdeviceA.CommandB.1.Hello',
 'DeviceA.SubdeviceA.CommandB.2.Hello',
 'DeviceA.SubdeviceA.CommandB.15.Hello',
 'DeviceA.SubdeviceB.CommandA.1.What',
 'DeviceB.CommandB.What']

CodePudding user response:

Specifying a key in sorted seems to achieve what you want:

import re

def my_key(s):
    n = re.search("\d ",s)
    return (s[:n.span()[0]], int(n[0])) if n else (s,) 

print(sorted(l, key = my_key))

Output:

['DeviceA.CommandA.1.Hello', 'DeviceA.CommandA.2.Hello', 'DeviceA.CommandA.3.Hello', 'DeviceA.CommandA.11.Hello', 'DeviceA.CommandB.1.Hello', 'DeviceA.CommandB.1.Bye', 'DeviceA.SubdeviceA.CommandB.1.Hello', 'DeviceA.SubdeviceA.CommandB.2.Hello', 'DeviceA.SubdeviceB.CommandA.1.What', 'DeviceB.CommandB.What']
  • Related