Home > Back-end >  Regex for capturing software versions
Regex for capturing software versions

Time:12-20

I am dealing with the versions of certain software packages in a for loop and I have used the following regex to capture just the numbers and exclude the text part of a version.

regex = r'[0-9][,-_\.\d]*(,\d )?/i'

Although the above regex works fine on regex101.com for the following versions:

binutils-112.16.91
bison-2.1
bogl-0.1.18-1.4
bogl-0.1.18_1.4
bogl-0.1-18_1.4
5.2
mod_ruby-1.2.4
2.0.0-1.00-r5_i586
bogl-0.1-18_1,4.4

The expected output from the above versions individually is:

112.16.91
2.1
0.1.18-1.4
0.1.18_1.4
0.1-18_1.4
5.2
1.2.4
2.0.0-1.00-r5_i586
0.1-18_1,4.4

But it returns empty match in Python. Could someone explain why this might be happening? Thanks!

CodePudding user response:

This part /i is not the notation in Python for a case insensitive pattern, it would be a flag re.I But note that as you are not matching any case sensitive characters, you don't need that flag at all.

Apart from that, your pattern would have partial matches instead of a full match as you can see in the demo link.

For the given examples, you can start the match with a word boundary and a digit, followed by optional repetitions of all allowed characters.

If you want to have multiple matches, instead of looping manually you can use re.findall to return all the matches in a list.

\b\d[\w,.-]*

A bit more specific, matching at least a single dot between digits and optionally match one of _ . , - followed by 1 word characters:

\b\d (?:\.\d ) (?:[_.,-]\w )*

Regex demo

CodePudding user response:

Kinda sketchy as if one package contains a number, the output will be wrong but otherwise, it works on your example

original = ['binutils-112.16.91',
 'bison-2.1',
 'bogl-0.1.18-1.4',
 'bogl-0.1.18_1.4',
 'bogl-0.1-18_1.4',
 '5.2',
 'mod_ruby-1.2.4',
 '2.0.0-1.00-r5_i586',
 'bogl-0.1-18_1,4.4']


individual_versions = []
for package in original:
    for char_index, char in enumerate(package): 
        if char.isdigit():
            individual_versions.append(package[char_index:])
            break
  • Related