Home > Software engineering >  Regex works on regex101, but not in powershell... why?
Regex works on regex101, but not in powershell... why?

Time:08-19

I have this test data:

^Test data



This is all just test data 


testing 123
ABC>space "ABC"

ABC>

And I've setup a regex on regex101.com (^\^|ERROR).*((|\n|\r|\w|\W)?) (?=ABC>)

The expression is returning just what I want on the site:

enter image description here

I am using this powershell I wrote to get content similar to the content above and looping through files, and looking for matches of the same regex expression.

$files = gci "\\server\path"
$content = @()


ForEach($file in $files){
    # Set script name
    $scriptname = "ABC TEST_081722"

    # Get the name of the task for the logfile's filename. 
    $taskname = "THIS IS A TEST!!!" 

    # Create log file with a datestamp MMDDYY
    $datestamp = (get-date).ToString('MMddyy')
    $logfilepath = "\\server\path\Logs\$($taskname)\$($file.basename).log"
    $log_dir = "\\server\path\Logs\$($taskname)\"

    # Get the content of the log file. We are only interested in getting lines which match a regex for our command line and our output line. 
    $content_raw = get-content $logfilepath -raw

    $content_raw -match "(^\^|ERROR).*((|\n|\r|\w|\W)?) (?=ABC>)"
    
    Write-host -f yellow $file.fullname
    $matches
    $matches.clear()

                                                                        
    start-sleep -s 2
}

The regex finds a match in two of my three test files, but not the first one which has the exact same string content as my example above. Why does it find a match in the 2nd and 3rd file but not the first?

The content of the 2nd and 3rd file are like so

ABC>W !,MSG

ERROR^BATCH~Batch in use
ABC>space "ABC"

So these two files do not have a line starting with a "^" symbol. It starts with "ERROR" which I accounted for with my OR statement in my regex. I just don't understand how it is able to find the lines which start with "ERROR" find, but not finding the lines from the first file which starts with "^" carat.

CodePudding user response:

Try the following regex instead:

(?sm)(?:^\^|^ERROR).*?(?=\r?\nABC>)

Note: This is a streamlined, working version of your regex (without capture groups); a corrected form of your regex would be (?m)(^\^|^ERROR).*((|\n|\r|\w|\W)?) (?=ABC>), for the reasons explained below.

Note that PowerShell's operators are case-insensitive by default (as is PowerShell in general). For case-sensitivity, use the c-prefixed operator variants, i.e., -cmatch in this case.

See this regex101.com page, where you can experiment with text from your files interactively.


As for what you tried:

  • ^ only matches at the start of individual lines if the MultiLine regex option is in effect, which you can activate using inline syntax with (?m) - note that, unlike PowerShell, regex101.com has this option turned on by default (see the option letters such as gm to the right of the regex input field), which would explain why you didn't see the problem there.

    • Similarly, (?s) activates the SingleLine regex option, which makes . match newline characters (\n) too.
  • ^\^|ERROR applies the start-of-input/line ^ metacharacter only to the (escaped) ^ character, not also to ERROR on the other side of the alternation (|).

    • Your test data wasn't at the very start of your input file (as shown in the screenshot), causing the ^ to be ineffective in the absence of (?m).
    • Conversely, because substring ERROR was (accidentally) not anchored, it still matched (but would match anywhere on a line).

Note:

  • As of this writing, regex101.com has no dedicated PowerShell support, and the closest approximation, .NET (C#), has defaults that are at odds with PowerShell's defaults.

  • For guidance on how to use regex101.com with PowerShell, including a link to a feature request to introduce PowerShell support in the future, see this answer.

  • Related