Question
I'm trying to match PowerShell dash comments (# ...
) but not inline comments (<# .. #>
) in same regex. How can I achieve it?
Goal
Match
I'd like to match PowerShell comments (using hashtag comment syntax). So simply everything after #
is commented out. I use #(.*$)/gm
for it.
Test-cases where the regex match is written inside brackets [..]
:
Write-Host "Hello world" [# comment here]
[# A line with only comment]
Comment without whitespace[#before]
[Comment with whitespace [#after ]
Do not match
However what I'd like to use here is have an exception for "inline comments syntax". Inline comments in PowerShell looks like lorem <# inline comment #> ipsus
.
So here I'm looking for exclusions for:
Write-Host "Hello world" <# inline comment here #>
<# A line with only inline comment #>
Comment without whitespace<#no whitespace#>around
Inline comment <# in middle #> of line
Comment with whitespace #comment with >
Comment with whitespace #comment with <
Comment with whitespace #comment with <# test #>
What I tried
I tried to use [^<>]
for something like #[^<>](.*[^<>]$)
but it did not work for all cases given in the above.
My progress on regex101 until I got stuck.
Why
I'm parsing PowerShell in JavaScript/TypeScript runtime to be able to inline them to run them in batch (cmd
) for a community driven open-source project. I know there will be exceptions to this (like strings with dashes inside) but I trade off simple regex parsing for robustness.
Thank you!
CodePudding user response:
I suggest checking for <
before a #
char and convert all negated character classes into negative lookarounds to avoid crossing over line boundaries:
#(?<!<#)(?![<>])(.*)$(?<![<>])
// Or, to also check for #> after <# use
#(?<!<#(?=.*#>))(?![<>])(.*)$(?<![<>])
See the regex demo. Remove (?<![<>])
negative lookbehind if you do not want to fail the match if the line ends with <
or >
.
Details:
#
- a#
char(?<!<#)
- no<#
allowed immediately to the left of the current location (note this check is only triggered after#
, so that the regex engine could check only the positions after#
, not every position in the string ((?<!<#(?=.*#>))
lookbehind with a nested lookahead makes sure the#
matched is not the second char of a<#...#>
substring)(?![<>])
- immediately on the right, there must be no<
and>
(.*)
- Group 1: any zero or more chars other than line break chars, as many as possible$
- end of string(?<![<>])
- at the end of string, there must be no<
and>
chars.