Home > Software design >  Powershell glob pattern matching
Powershell glob pattern matching

Time:12-15

I am looking through C:\ProgramFiles for a jar file named log4j-core-x.y.z.jar. I am trying to match on the last digit z, which can be both a one or two digit number (0-99). I can't seem to get the right glob pattern to accomplish this.

Code:

PS C:\Users\Administrator> Get-ChildItem -Path 'C:\Program Files\' -Filter log4j-core-*.*.[1-9][0-9].jar -Recurse -ErrorAction SilentlyContinue -Force | %{$_.FullName}

This yields no results, but when I just do all wildcards like, -Filter log4j-core-*.*.*.jar, I get:

C:\Program Files\apache-log4j-2.16.0-bin\apache-log4j-2.16.0-bin\log4j-core-2.16.0-javadoc.jar
C:\Program Files\apache-log4j-2.16.0-bin\apache-log4j-2.16.0-bin\log4j-core-2.16.0-sources.jar
C:\Program Files\apache-log4j-2.16.0-bin\apache-log4j-2.16.0-bin\log4j-core-2.16.0-tests.jar
C:\Program Files\apache-log4j-2.16.0-bin\apache-log4j-2.16.0-bin\log4j-core-2.16.0.jar

The only thing I care about getting is C:\Program Files\apache-log4j-2.16.0-bin\apache-log4j-2.16.0-bin\log4j-core-2.16.0.jar, log4j-core-2.16.0.jar

CodePudding user response:

-Filter doesn't support filtering with regex or Character ranges such as [A-Z] or [0-9]. Thanks mklement0 for pointing it out.

From the parameter description of Get-ChildItem official documentation:

The filter string is passed to the .NET API to enumerate files. The API only supports * and ? wildcards.

Try with this:

Get-ChildItem -Path 'C:\Program Files\' -Filter log4j-core-*.*.??.jar -Recurse -ErrorAction SilentlyContinue -Force |
Where-Object {
    $_.Name -match '\.\d{1,2}\.jar$'
    # => Ends with a . followed by 1 or 2 digits and the .jar extension
}

CodePudding user response:

Santiago Squarzon's helpful answer offers a regex-assisted solution that has the potential to perform much more sophisticated matching than required in the case at hand.

Let me complement it with a wildcard-based solution that builds on your own attempt:

  • The -Filter parameter does not support PowerShell's wildcard syntax; it only supports * and ? as wildcard metacharacters (as Santiago notes), not also character-range/set constructs such as [0-9].

    • Instead, -Filter arguments are interpreted by the platform's file-system APIs, which on Windows additionally have legacy quirks - see this answer.

    • That said, with patterns that -Filter does support, its use is preferable to -Include (see below), because it performs much better, due to filtering at the source.

  • By contrast, the -Include parameter does use PowerShell's wildcards and additionally supports multiple patterns.

Unlike regexes, character-range/set expressions in PowerShell's wildcard language do not support duplication (quantifier) logic and match exactly one character each (just like ? does for any single character; * is the only metacharacter that implicitly supports duplication: zero or more characters).

Therefore, [1-9][0-9] matches exactly 2 characters (digits), and also matching just one digit ([0-9]) requires an additional pattern:

Get-ChildItem -Recurse 'C:\Program Files' -Include log4j-core-*.*.[0-9].jar, log4j-core-*.*.[1-9][0-9].jar -ErrorAction SilentlyContinue -Force | 
  ForEach-Object FullName

Caveats:

  • Using -Include (or -Exclude) without -Recurse doesn't work as one would expect - see this answer.

  • As of PowerShell 7.2, combining -Recurse with -Include suffers from performance problems due to inefficient implementation - see GitHub issue #8662.

  • Related