Home > Enterprise >  PwSh RegEx capture Version information - omit (surrounding) x(32|64|86) & 32|64(-)Bit
PwSh RegEx capture Version information - omit (surrounding) x(32|64|86) & 32|64(-)Bit

Time:05-23

I'm pulling my hair, to RegEx-tract the bare version information from some filenames. e.g. "1.2.3.4"

Let's assume, I have the following Filenames:

VendorSetup-x64-1.23.4.exe
VendorSetup-1-2-3-4.exe
Vendor Setup 1.23.456Update.exe
SoftwareName-1.2.34.5-x64.msi
SoftwareName-1.2.3.4-64bit.msi
SoftwareName-64-Bit-1.2.3.4.msi
VendorName_SoftwareName_64_1.2.3_Setup.exe

(And I know there are still some filenames out there, that have "x32" as well as "x86" in them, so I've added them to the title)

First of all, I replaced the _'s & -'s by .'s which I'd like to avoid in general, but haven't found a cleverer approach and to be honest - only works well if there's no other "digit"-information in the String for example like the 2nd Filename.

I then tried to extract the Version information using Regex like

-replace '^(?:\D )?(\d ((\.\d ){1,4})?)(?:.*)?', '$1'

Which lacks the ability to omit "x64", "64Bit", "64-Bit" or any variation of that generally.

Additionally, I played around with RegExes like

 -replace '^(?:[xX]*\d{2})?(?:\D )?(\d ((\.\d ){1,4})?)(?:.*)?$', '$1'

to try to omit a leading "x64" or "64", but with no success (most probably because of the replacement from -'s to .'s.

And before it gets even worse, I'd like to ask if there's anybody who could help me or lead me in the right direction?

Thanks in advance!

CodePudding user response:

This one works with all of the given samples:

(?<=[\s_-])(?:\d (?:\.\d ){1,3}|\d (?:-\d ){1,3})

You just have to replace dashes by dots in the captured value.

Demo:

# Create an array of sample filenames
$names = @'
VendorSetup-2022-05-x64-1.23.4.exe
VendorSetup-x64-1.23.4-2022-05.exe
VendorSetup-1-2-3-4.exe
VendorSetup_2022-05_1-2-3-4.exe
Vendor Setup 1.23.456Update.exe
SoftwareName-1.2.34.5-x64.msi
SoftwareName-1.2.3.4-64bit.msi
SoftwareName-64-Bit-1.2.3.4.msi
VendorName_SoftwareName_64_1.2.3_Setup.exe
'@ -split '\r?\n'

$versionPattern = '(?<=[\s_-])(?:\d (?:\.\d ){1,3}|\d (?:-\d ){1,3})'

foreach( $name in $names ) {
    
    $version = $null
    if( $matched = [regex]::Matches( $name, $versionPattern ) ) {

        $matchedValues = @($matched.Value)

        # Prefer the last match that contains '.'
        if( $lastValueWithDot = $matchedValues.Where({ $_.Contains('.') }, 'Last') ) {
            $version = $lastValueWithDot[ 0 ]  # Need index because .Where() always outputs an array
        }
        else {
            # Otherwise take the last match
            $version = $matchedValues[ -1 ]
        }
    }

    # Normalize the version number separators
    $version = $version -replace '-', '.'

    # Output custom object for nice table formatting
    [PSCustomObject]@{
        Name    = $name
        Version = $version
    }
}

Output:

Name                                       Version
----                                       -------
VendorSetup-2022-05-x64-1.23.4.exe         1.23.4
VendorSetup-x64-1.23.4-2022-05.exe         1.23.4
VendorSetup-1-2-3-4.exe                    1.2.3.4
VendorSetup_2022-05_1-2-3-4.exe            1.2.3.4
Vendor Setup 1.23.456Update.exe            1.23.456
SoftwareName-1.2.34.5-x64.msi              1.2.34.5
SoftwareName-1.2.3.4-64bit.msi             1.2.3.4
SoftwareName-64-Bit-1.2.3.4.msi            1.2.3.4
VendorName_SoftwareName_64_1.2.3_Setup.exe 1.2.3

Explanation:

The RegEx has three important parts:

  • (?<=[\s_-]) … positive lookbehind assertion makes sure that the version is separated by space, underscore or dash on the left side. This prevents sub string 64-1 from the first sample to match as a version.
  • (?: … introduces a non-capturing group of two alternatives:
    • \d (?:\.\d ){1,3} … 2 to 4 version numbers separated by .
    • \d (?:-\d ){1,3} … 2 to 4 version numbers separated by -

Note that the pattern also matches 2022-05 in some of the sample filenames which I modified. Instead of further complicating the pattern we can use [regex]::Matches() to capture all matches and resolve ambiguities through PowerShell code:

  • Prefer the last matched value that contains ..
  • If none of the matched values contain ., just take the last value.

See detailed explanation at RegEx101.

  • Related