Get the highest number of a list using regex-CodePudding

I have a dict like this :

my_dict = {
    "['000A']":
        ['1653418_a0001b001.jpg',
         '2132018_a0002b002.jpg',
         '4789562_a0001b003.jpg',
         '8469844_a0009b004.jpg',
         '4815099_a0004b000.jpg',
         '9085654_a0001b001.jpg',
         '9742212_a0007b002.jpg',
         '1325874_a0002b009.jpg',
         '1474856_a0090f014.jpg']
    ,
    "['000B']":
        ['1653418_a0001b001.jpg',
         '2132018_a0002b002.jpg',
         '4789562_a0001b003.jpg',
         '8469844_a0009b004.jpg',
         '4815099_a0004b000.jpg',
         '9085654_a0001b001.jpg',
         '9742212_a0007b002.jpg',
         '1325874_a0002b009.jpg',
         '123456_a0090f020.jpg']
}

And I want to find the highest number following the "b" for each keys of the dict

for key, value in my_dict
    number = re.findall('\d ', string)
    #convert it into integer
    number = map(int, number)
    print("Max_value:",max(number))

It doesn't work because then it would find the value at the begining of the string. I was thinking then to use a .endswith(("....."))

But still, I don't know how to formulate it to match my need, which would be a pattern that after 'b' matches 4 numbers or 4 number followed by '.jpg' or even endwith 'b' 4 numbers and '.jpg' but also I would like the code to find what number is the highest bXXXX and then return :

{"['000A']": '1474856_a0090f014.jpg', "['000B']": '123456_a0090f020.jpg'}

CodePudding user response：

I suppose it can appear both "b" or "f".

output_dict = {}
for key, value in my_dict.items():
    output_dict[key] = sorted(value, key=lambda x: int(re.match(".*[bf](\d )\.jpg", x).groups()[0]))

CodePudding user response：

There are 3 numbers after the b or f. If there can be more variations of lowercase chars, you can match a single lowercase char with [a-z] If there can be a variation of digits, you can match 1 or more using \d or match 3 or more using \d{3,}

Then you could match.jpg at the end of the string.

If there is a match, get the capture group 1 value and convert it to an int and use that to sort on.

After the sorting, get the first item from the list (assuming there are no empty lists)

import re

my_dict = {
    "['000A']":
        ['1653418_a0001b001.jpg',
         '4789562_a0001b003.jpg',
         '8469844_a0009b004.jpg',
         '4815099_a0004b000.jpg',
         '1474856_a0090f014.jpg',
         '9085654_a0001b001.jpg',
         '9742212_a0007b002.jpg',
         '1325874_a0002b009.jpg']
    ,
    "['000B']":
        ['1653418_a0001b001.jpg',
         '2132018_a0002b002.jpg',
         '4789562_a0001b003.jpg',
         '8469844_a0009b004.jpg',
         '4815099_a0004b000.jpg',
         '9085654_a0001b001.jpg',
         '9742212_a0007b002.jpg',
         '123456_a0090f020.jpg',
         '1325874_a0002b009.jpg']
}

dct_highest_number = {}

for key, value in my_dict.items():
    dct_highest_number[key] = sorted(
        value,
        key=lambda x: [int(m.group(1)) for m in [re.search(r"[a-z](\d )\.jpg$", x)] if m],
        reverse=True
    )[0]

print(dct_highest_number)

Output

{"['000A']": '1474856_a0090f014.jpg', "['000B']": '123456_a0090f020.jpg'}