Home > Blockchain >  How can I replace certain characters (and not others) between delimiters and across newlines, using
How can I replace certain characters (and not others) between delimiters and across newlines, using

Time:02-17

Here is a sample:

: [
          {
            "yearGroupId": 13,
            "educationPhaseEnum": 2,
            "name": "Year Group 12",
            "label": "YG 12"
          },
          {
            "yearGroupId": 14,
            "educationPhaseEnum": 2,
            "name": "Year Group 13",
            "label": "YG 13"
          }
        ]

I want to remove the line breaks, and all quotes. I only want to do this between the strings ': [' and ' ]'. So the desired output would look like this:

[      {        yearGroupId: 13,        educationPhaseEnum: 2,        name: Year Group 12,        label: YG 12      },      {        yearGroupId: 14,        educationPhaseEnum: 2,        name: Year Group 13,        label: YG 13      }    ]

I've tried Powershell -NoProfile "(Get-Content -Raw .\allacts.txt) -replace '(?<=\u003a\u0020\u005b).*[\n\r\u0022].*(?=\u0020\u0020\u0020\u0020\u005d)', '' | Out-File -FilePath allacts.txt -Force -Encoding ASCII"

and about a hundred other things... but can't get my head around how it's meant to work. What do I have to do to get Powershell to replace these characters within these bounds? In other places in the file I need the line breaks.

Thanks.

Edit: Yep, this is JSON data. The issue is that there are duplicate keys (I can't change that). Converting it to a CSV results Powershell ignoring duplicate keys and picking one of them to go into the output CSV. Directly importing the JSON into Excel (where I need it to go) results in Excel rejecting it as it can't handle duplicate keys.

So, I decided to just glom everything into one value and use Power Query to sort it out at the other end (using the commas as delimiters).

CodePudding user response:

Your task requires a dynamic -replace operation operation, which Windows PowerShell (powershell.exe) - unlike PowerShell (Core) 7 (pwsh) - cannot provide directly:

  • You need to identify the block of interest in your input file...

  • ...and then perform the desired transformations on that block only.

This answer discusses dynamic replacements in more detail, but applied to your case this means (the assumption is that you're calling from outside PowerShell, such as from cmd.exe / a batch file):

powershell.exe -NoProfile -c "[regex]::Replace((Get-Content -Raw .\allacts.txt), '(?s): \[. ?\r?\n        \]', { param($match) $match.Value -replace '[\r\n\"]' }) | Out-File -FilePath allacts.txt -Force -Encoding ASCII"
  • For an explanation of the block-matching regex passed as the 2nd argument to [regex]::Replace() and the ability to experiment with it, see this regex101.com page.

  • The regex used inside the 3rd argument, the dynamic replacement script block ({ ... }), is [\r\n"], which matches all CR (\r), LF (\n), and " characters, and since it is used with -replace without a replacement operand, effectively removes them.

  • Related