Home > Mobile >  Latest value of a number in a month
Latest value of a number in a month

Time:08-24

There are files on disk, named liked this:

<fille>-<year>_<month>_<release>* Example filenames:

/tmp/release-notes-v22_05_01.pdf
/tmp/release-notes-v22_05_02.pdf
/tmp/release-notes-v22_06_06.pdf
/tmp/release-config-v22_06_03.pdf

From this file, I have set a variable for the date and release number, example:

2022-05-01 00:00:00 01
2022-05-01 00:00:00 02
2022-06-01 00:00:00 06
2022-06-01 00:00:00 03

Q: How can I code python to print only the latest release of each month. In this case, the output would be:

2022-05-01 00:00:00 02
2022-06-01 00:00:00 06

CodePudding user response:

You can use defaultdict to easily group the results up. Since you didn't show your code to figure out the dates/versions from filenames, here's a version of that too.

import re
from collections import defaultdict

date_and_version_re = re.compile("v(\d _\d )_(\d )")

filenames = [
    "release-notes-v22_05_01.pdf",
    "release-notes-v22_05_02.pdf",
    "release-notes-v22_06_06.pdf",
    "release-config-v22_06_03.pdf",
]

grouped = defaultdict(dict)  # {date: {version: filename}}

for filename in filenames:
    match = date_and_version_re.search(filename)
    date, version = match.groups()
    grouped[date][version] = filename

for date, versions in sorted(grouped.items()):
    highest_version = max(versions)
    print(date, highest_version, versions[highest_version])

This prints out

22_05 02 release-notes-v22_05_02.pdf
22_06 06 release-notes-v22_06_06.pdf

CodePudding user response:

Since all the files has the same name format you can sort the list by the month and version and utilize the fact that dictionaries retain the last inserted value

files = sorted(os.listdir('tmp'), key=lambda x: x[-9:-5])
d = {f[-9:-5]: f for f in files}
print(list(d.values())) # ['release-notes-v22_05_02.pdf', 'release-notes-v22_06_06.pdf']

CodePudding user response:

If you want a bit more functional solution you can use something like this:

# Tools for working with iterators
import itertools
import re

filenames = [
    "release-notes-v22_05_01.pdf",
    "release-notes-v22_05_02.pdf",
    "release-notes-v22_06_06.pdf",
    "release-config-v22_06_03.pdf",
]

# Regex copied from AKX but using named groups for readability
date_and_version_re = re.compile("v(?P<year>\d )_(?P<month>\d )_(?P<release>\d )")
# You can edit this function to find the max by any other category, for example remove month year to find the best for every year
group_key = lambda result: (result.group("year"), result.group("month"))

# list comprehensions should be read bottom to top
best_each_month = [
    # Format the result using format, using the names groups
    "20{year}-{month}-01 00:00:00 {release}".format(
        # Find the group with the highest release number. i[0] will be the key while i[1] is all the values
        **max(i[1], key=lambda k:k.group("release")).groupdict()
    # Group successive elements that share a key
    ) for i in itertools.groupby(
        # Sort the list so that successive elements are adjacent
        sorted((date_and_version_re.search(i) for i in filenames), key=group_key),
        # The group key is a generator that will decide if 2 elements should be grouped
        group_key
    )
]
print(best_each_month)
  • Related