Home > database >  How to sort lists that contain letters and numbers?
How to sort lists that contain letters and numbers?

Time:11-24

I have tried lots of different ways to sort the list, but it never sorts it.

list = ['american dad S1-EP1', 'american dad S1-EP10', 'american dad S1-EP11', 'american dad S1-EP12', 'american dad S1-EP13', 'american dad S1-EP14', 'american dad S1-EP15', 'american dad S1-EP16', 'american dad S1-EP17', 'american dad S1-EP18', 'american dad S1-EP19', 'american dad S1-EP2', 'american dad S1-EP20', 'american dad S1-EP21', 'american dad S1-EP22', 'american dad S1-EP23', 'american dad S1-EP3', 'american 
dad S1-EP4', 'american dad S1-EP5', 'american dad S1-EP6', 'american dad S1-EP7', 'american dad S1-EP8', 'american dad S1-EP9']

I want them to all be in order eg: ep1 ep2 ep3 ep4 ep5

CodePudding user response:

found an answer by using:

list.sort(key=lambda x: int("".join([i for i in x if i.isdigit()])))

CodePudding user response:

I suggest to use re module to extract name, episode, season etc. The key_function will sort the list by Name, Season, Episode:

import re

pat = re.compile(r"(.*) S(\d )-EP(\d )")


def key_function(value):
    name, season, episode = pat.search(value).groups()
    return name, int(season), int(episode)


print(sorted(lst, key=key_function))

Prints:

[
    "american dad S1-EP1",
    "american dad S1-EP2",
    "american dad S1-EP3",
    "american dad S1-EP4",
    "american dad S1-EP5",
    "american dad S1-EP6",
    "american dad S1-EP7",
    "american dad S1-EP8",
    "american dad S1-EP9",
    "american dad S1-EP10",
    "american dad S1-EP11",
    "american dad S1-EP12",
    "american dad S1-EP13",
    "american dad S1-EP14",
    "american dad S1-EP15",
    "american dad S1-EP16",
    "american dad S1-EP17",
    "american dad S1-EP18",
    "american dad S1-EP19",
    "american dad S1-EP20",
    "american dad S1-EP21",
    "american dad S1-EP22",
    "american dad S1-EP23",
]

CodePudding user response:

Try using the sorted function with a key:

list1 = ['american dad S1-EP1', 'american dad S1-EP10', 'american dad S1-EP11', 'american dad S1-EP12', 'american dad S1-EP13', 'american dad S1-EP14', 'american dad S1-EP15', 'american dad S1-EP16', 'american dad S1-EP17', 'american dad S1-EP18', 'american dad S1-EP19',
        'american dad S1-EP2', 'american dad S1-EP20', 'american dad S1-EP21', 'american dad S1-EP22', 'american dad S1-EP23', 'american dad S1-EP3', 'american dad S1-EP4', 'american dad S1-EP5', 'american dad S1-EP6', 'american dad S1-EP7', 'american dad S1-EP8', 'american dad S1-EP9']

def get_last_digits(s):
    last_digits = s[s.index("P")   1:]
    return int(last_digits)

list1.sort(key=get_last_digits)

Note: This only works if all episodes are the same season.

CodePudding user response:

  1. Create a regular expression pattern with two capturing groups - one for the season number, one for the episode number.
  2. Define a custom key for the sorting function, which returns a tuple of integers. The episodes will be sorted in ascending order according to these integers.

Code:

import re

episodes = [
    'american dad S1-EP1',
    'american dad S1-EP10',
    'american dad S1-EP11',
    'american dad S1-EP12',
    'american dad S1-EP13',
    'american dad S1-EP14',
    'american dad S1-EP15',
    'american dad S1-EP16',
    'american dad S1-EP17',
    'american dad S1-EP18',
    'american dad S1-EP19',
    'american dad S1-EP2',
    'american dad S1-EP20',
    'american dad S1-EP21',
    'american dad S1-EP22',
    'american dad S1-EP23',
    'american dad S1-EP3',
    'american dad S1-EP4',
    'american dad S1-EP5',
    'american dad S1-EP6',
    'american dad S1-EP7',
    'american dad S1-EP8',
    'american dad S1-EP9'
]

pattern = "S(\\d )-EP(\\d )"

def key(episode):
    regex_match = re.search(pattern, episode)
    return tuple(map(int, regex_match.groups()))

print(sorted(episodes, key=key))

Output:

['american dad S1-EP1', 'american dad S1-EP2', 'american dad S1-EP3', 'american dad S1-EP4', 'american dad S1-EP5', 'american dad S1-EP6', 'american dad S1-EP7', 'american dad S1-EP8', 'american dad S1-EP9', 'american dad S1-EP10', 'american dad S1-EP11', 'american dad S1-EP12', 'american dad S1-EP13', 'american dad S1-EP14', 'american dad S1-EP15', 'american dad S1-EP16', 'american dad S1-EP17', 'american dad S1-EP18', 'american dad S1-EP19', 'american dad S1-EP20', 'american dad S1-EP21', 'american dad S1-EP22', 'american dad S1-EP23']
>>> 

CodePudding user response:

The big question here would be whether you need to sort decimals or not. Assuming that you only care about integers (e.g. that 12.6 would come before 12.56), then you can convert the list of strings to a list of lists, where each item in the list is either a string or an integer, then sort that:

import re

RE_NUM = re.compile(r'(\d )|(\D )')

def sort_mixed(strings):
    # sort list of strings with integers embedded in them
    split_strings = []
    for string in strings:
        split_string = [(int(i or 0), i or s) for i, s in RE_NUM.findall(string)]
        split_strings.append(split_string)
    return [''.join(s for _, s in v) for v in sorted(split_strings)]

# example usage
sort_mixed(['15.51', '12.9', '15.6.6', '15.6'])
# ['12.9', '15.6', '15.6.6', '15.51']

Note: Unlike other answers in this thread, the above works for any combination of integers and strings, including both no integers, no strings, or any number of integers more than one.

CodePudding user response:

You can customize the sorted key by lambda. (BTW, avoid to name a variable as list in python because it's a reserved word link)

For more details about lambda, you can check link

Example:

l = ['american dad S1-EP1', 'american dad S1-EP10', 'american dad S1-EP11', 'american dad S1-EP12', 'american dad S1-EP13', 'american dad S1-EP14', 'american dad S1-EP15', 'american dad S1-EP16', 'american dad S1-EP17', 'american dad S1-EP18', 'american dad S1-EP19', 'american dad S1-EP2', 'american dad S1-EP20', 'american dad S1-EP21', 'american dad S1-EP22', 'american dad S1-EP23', 'american dad S1-EP3', 'american dad S1-EP4', 'american dad S1-EP5', 'american dad S1-EP6', 'american dad S1-EP7', 'american dad S1-EP8', 'american dad S1-EP9']
sorted_l = sorted(l, key=lambda x: int(x.split("-EP")[1]))
print(sorted_l)

Or, python can sort one list based on values from another list (check link). You can create a new list, which only contains ep number.

Example:

l = ['american dad S1-EP1', 'american dad S1-EP10', 'american dad S1-EP11', 'american dad S1-EP12', 'american dad S1-EP13', 'american dad S1-EP14', 'american dad S1-EP15', 'american dad S1-EP16', 'american dad S1-EP17', 'american dad S1-EP18', 'american dad S1-EP19', 'american dad S1-EP2', 'american dad S1-EP20', 'american dad S1-EP21', 'american dad S1-EP22', 'american dad S1-EP23', 'american dad S1-EP3', 'american dad S1-EP4', 'american dad S1-EP5', 'american dad S1-EP6', 'american dad S1-EP7', 'american dad S1-EP8', 'american dad S1-EP9']
ep_list = [int(x.split("-EP")[1]) for x in l]
sorted_l = [x for _, x in sorted(zip(ep_list, l))]
print(sorted_l)

output:

['american dad S1-EP1', 'american dad S1-EP2', 'american dad S1-EP3', 'american dad S1-EP4', 'american dad S1-EP5', 'american dad S1-EP6', 'american dad S1-EP7', 'american dad S1-EP8', 'american dad S1-EP9', 'american dad S1-EP10', 'american dad S1-EP11', 'american dad S1-EP12', 'american dad S1-EP13', 'american dad S1-EP14', 'american dad S1-EP15', 'american dad S1-EP16', 'american dad S1-EP17', 'american dad S1-EP18', 'american dad S1-EP19', 'american dad S1-EP20', 'american dad S1-EP21', 'american dad S1-EP22', 'american dad S1-EP23']
  • Related