I'm pulling my hair, to RegEx-tract the bare version information from some filenames. e.g. "1.2.3.4"
Let's assume, I have the following Filenames:
VendorSetup-x64-1.23.4.exe
VendorSetup-1-2-3-4.exe
Vendor Setup 1.23.456Update.exe
SoftwareName-1.2.34.5-x64.msi
SoftwareName-1.2.3.4-64bit.msi
SoftwareName-64-Bit-1.2.3.4.msi
VendorName_SoftwareName_64_1.2.3_Setup.exe
(And I know there are still some filenames out there, that have "x32" as well as "x86" in them, so I've added them to the title)
First of all, I replaced the _
's & -
's by .
's which I'd like to avoid in general, but haven't found a cleverer approach and to be honest - only works well if there's no other "digit"-information in the String for example like the 2nd Filename.
I then tried to extract the Version information using Regex like
-replace '^(?:\D )?(\d ((\.\d ){1,4})?)(?:.*)?', '$1'
Which lacks the ability to omit "x64", "64Bit", "64-Bit" or any variation of that generally.
Additionally, I played around with RegExes like
-replace '^(?:[xX]*\d{2})?(?:\D )?(\d ((\.\d ){1,4})?)(?:.*)?$', '$1'
to try to omit a leading "x64" or "64", but with no success (most probably because of the replacement from -
's to .
's.
And before it gets even worse, I'd like to ask if there's anybody who could help me or lead me in the right direction?
Thanks in advance!
CodePudding user response:
This one works with all of the given samples:
(?<=[\s_-])(?:\d (?:\.\d ){1,3}|\d (?:-\d ){1,3})
You just have to replace dashes by dots in the captured value.
Demo:
# Create an array of sample filenames
$names = @'
VendorSetup-2022-05-x64-1.23.4.exe
VendorSetup-x64-1.23.4-2022-05.exe
VendorSetup-1-2-3-4.exe
VendorSetup_2022-05_1-2-3-4.exe
Vendor Setup 1.23.456Update.exe
SoftwareName-1.2.34.5-x64.msi
SoftwareName-1.2.3.4-64bit.msi
SoftwareName-64-Bit-1.2.3.4.msi
VendorName_SoftwareName_64_1.2.3_Setup.exe
'@ -split '\r?\n'
$versionPattern = '(?<=[\s_-])(?:\d (?:\.\d ){1,3}|\d (?:-\d ){1,3})'
foreach( $name in $names ) {
$version = $null
if( $matched = [regex]::Matches( $name, $versionPattern ) ) {
$matchedValues = @($matched.Value)
# Prefer the last match that contains '.'
if( $lastValueWithDot = $matchedValues.Where({ $_.Contains('.') }, 'Last') ) {
$version = $lastValueWithDot[ 0 ] # Need index because .Where() always outputs an array
}
else {
# Otherwise take the last match
$version = $matchedValues[ -1 ]
}
}
# Normalize the version number separators
$version = $version -replace '-', '.'
# Output custom object for nice table formatting
[PSCustomObject]@{
Name = $name
Version = $version
}
}
Output:
Name Version
---- -------
VendorSetup-2022-05-x64-1.23.4.exe 1.23.4
VendorSetup-x64-1.23.4-2022-05.exe 1.23.4
VendorSetup-1-2-3-4.exe 1.2.3.4
VendorSetup_2022-05_1-2-3-4.exe 1.2.3.4
Vendor Setup 1.23.456Update.exe 1.23.456
SoftwareName-1.2.34.5-x64.msi 1.2.34.5
SoftwareName-1.2.3.4-64bit.msi 1.2.3.4
SoftwareName-64-Bit-1.2.3.4.msi 1.2.3.4
VendorName_SoftwareName_64_1.2.3_Setup.exe 1.2.3
Explanation:
The RegEx has three important parts:
(?<=[\s_-])
… positive lookbehind assertion makes sure that the version is separated by space, underscore or dash on the left side. This prevents sub string64-1
from the first sample to match as a version.(?:
… introduces a non-capturing group of two alternatives:\d (?:\.\d ){1,3}
… 2 to 4 version numbers separated by.
\d (?:-\d ){1,3}
… 2 to 4 version numbers separated by-
Note that the pattern also matches 2022-05
in some of the sample filenames which I modified. Instead of further complicating the pattern we can use [regex]::Matches()
to capture all matches and resolve ambiguities through PowerShell code:
- Prefer the last matched value that contains
.
. - If none of the matched values contain
.
, just take the last value.