I am trying to remove text from a file between { and } inclusive but only if it contains the string "RG " - the third and fourth group in the following. (Note that if it is the last in the list it will not have the trailing comma.)
"morphs" : [
{
"uid" : "AQ_Eun-Ju-Body-2022-31-7--15_02_30.592.vmi",
"name" : "AQ_Eun-Ju-Body",
"value" : "1"
},
{
"uid" : "AQ_Eun-Ju-Head-2022-31-7--15_02_30.592.vmi",
"name" : "AQ_Eun-Ju-Head",
"value" : "1"
},
{
"uid" : "RG side2side.vmi",
"name" : "RG side2side",
"value" : "-0.3332869"
},
{
"uid" : "RG UpDown2.vmi",
"name" : "RG UpDown2",
"value" : "-0.3332869"
}
]
I can get it to work with -replace '.*{\n.\*RG .\*\n.\*\n.\*\n.\*}.\*\n',''
however if the group does not have three lines it fails because of the explicit linefeeds. I can create a replace for each number of lines but that seems clunky. I tried 'RG.\*?},\*\n'
which gives me the last part, but I'm struggling with the first part.
This is what I have so far:
Get-ChildItem $VAMfixDir -recurse -include *.json,*.vap,*.vaj | Where-Object { $timestamp -lt $_.CreationTime } |
Foreach-Object {
$originalContent = $_ | Get-Content -Raw
# *Potentially* perform replacements, depending on whether the search patterns are found.
$potentiallyModifiedContent = $originalContent -Replace ".*{\n.*RG .*\n.*\n.*\n.*}.*\n|.*{\n.*RG .*\n.*\n.*}.*\n",""
Set-Content -NoNewLine -Encoding Ascii -LiteralPath $_.FullName -Value $potentiallyModifiedContent
}
EDIT: The file in the example above that I'm trying to edit IS a json file, but I'm trying to create a POWERSHELL script to remove groups of lines from it using REGEX. I have shown the regex that works but it has its limitations, as stated. I was hoping for a more elegant solution than a massive OR'd -Replace statement.
CodePudding user response:
As noted, it's generally preferable to use a dedicated parser and serializer for parsing JSON data, namely
ConvertFrom-Json
andConvertTo-Json
However, regex-based transformations may be an option if you're looking to preserve the exact formatting of the input file and/or the desired transformations are syntactically limited in a way that allows them to based on regexes reliably, which does appear to be the case here.
$potentiallyModifiedContent =
$originalContent -Replace '(?:,\s*)?\{[^}] \bRG\b[^}] \}'
For a detailed explanation of the regex and the ability to experiment with it, see this regex101.com page.
As for what you tried:
.*{\n.\*RG .\*\n.\*\n.\*\n.\*}.\*\n
While not always necessary (it isn't in this case), it's best to routinely escape
{
and}
characters meant to be taken literally, given that these characters are metacharacters used for quantifiers, with the proper syntax between them (e.g.,{2}
matches the previous subexpression exactly2
times)Conversely, if you do want
*
to be treated as a metacharacter (a quantifier matching the preceding subexpression zero or more times), do not escape it (as\*
).In general, you can use the
SingleLine
regex option to make.
match newlines too, so that.*
would match across lines. The simplest way to activate this option is to place(?s)
at the start of the regex.{\n.\*RG
- if corrected to\{\n.*RG
or even to the non-greedy\{\n.*?RG
- is too permissive, as it will start matching at the first{
, even if that block does not containRG
and keep matching across the end of that and potentially later ones untilRG
is found in a block.Ultimately, it's best to use
[^{]
and[^}]
, as shown above, to match the characters after the opening{
and before the closing}
, which implicitly matches across lines too.
See also:
- Since PowerShell's regex functionality builds on .NET's, see the Regular Expression Language - Quick Reference.