I need to convert an input-string with multipe words into a string-array via Powershell. Words can be separated by multiple spaces and/or linebreaks. Each word can be escaped by a single quote or a double quote. Some words may start with a hashtag - in that case any quoting appears after that hashtag.
Here a code sample of a possible input and the expected result:
$inputString = @"
test1
#custom1
#"custom2" #'custom3'
#"custom ""four""" #'custom ''five'''
test2 "test3" 'test4'
"@
$result = @(
'test1'
'#custom1'
'"#custom2"'
"#'custom3'"
'#"custom ""four"""'
"#'custom ''five'''"
'test2'
'"test3"'
"'test4'"
)
Is there any solution to do this via a clever RegEx-expression? Or does someone have a parser-snippet/function to start with?
CodePudding user response:
Assuming you fully control or implicitly trust the input string, you can use the following approach, which relies on Invoke-Expression
, which should normally be avoided:
Assumptions made:
#
only appears at the start of embedded strings.- No embedded string contains newlines itself.
$inputString = @"
test1
#custom1
#"custom2" #'custom3'
#"custom ""four""" #'custom ''five'''
test2 "test3" 'test4'
"@
$embeddedStrings = Invoke-Expression @"
Write-Output $($inputString -replace '\r?\n', ' ' -replace '#', '`#')
"@
Caveat: The outer quoting around the individual strings is lost in the process and the embedded, escaped quotes are unescaped; outputting $embeddedString
yields:
test1
#custom1
#custom2
#custom3
#custom "four"
#custom 'five'
test2
test3
test4
The approach relies on the fact that your embedded strings use PowerShell's quoting and quote-escaping rules; the only problems are the leading #
characters, which are escaped as `#
above.
By replacing the embedded newlines (\r?\n
) with spaces, the result can be passed as a list of positional arguments to Write-Output
, inside a string that is then evaluated with Invoke-Expression
.