Home > Software engineering >  Add copyright character into json file from PowerShell script
Add copyright character into json file from PowerShell script

Time:11-11

I have a script that updates a configuration file with current year, but for some reason the copyright symbol is not being inserted correctly. The PowerShell script is UTF-8 with BOM and the JSON file is UTF-8.

The workflow is that I read from a JSON file, update the copyright date, and then save to a JSON file again.

The JSON file info.json:

{
    "CopyrightInfo":  "Copyright © CompanyName 1992"
}

Reproducible excerpt of the PowerShell script:

$path = "./info.json"
$a = Get-Content $path| ConvertFrom-Json
$a.'CopyrightInfo' = "Copyright $([char]::ConvertFromUtf32(0x000000A9)) CompanyName $((Get-Date).Year)"
$a | ConvertTo-Json | set-content $path

I've tried a bunch of ways, above is the latest attempt. It looks fine when printed in PowerShell or opened in Notepad, but any other editor (Visual Studio Code, SourceTree, Azure DevOps file viewer, etc) they always result in the following:

"CopyrightInfo":  "Copyright � CompanyName 2022"

If anyone can explain what I'm doing wrong that would great and even greater if they could also add a way to make it work properly.

I'm using PowerShell version 5.1.19041.1682

EDIT: Updated issue with reproducible code excerpts and used PowerShell version.

CodePudding user response:

Can't reproduce the issue:

$Data = @{ CopyrightInfo = "Copyright $([char]::ConvertFromUtf32(0x000000A9)) CompanyName $((Get-Date).Year)" }
$Json = ConvertTo-Json $Data
$Json |Set-Content .\Test.json
$Json = Get-Content -Raw .\Test.json
$Data = ConvertFrom-Json $Json
$Data
CopyrightInfo
-------------
Copyright © CompanyName 2022

To show the result in PowerShell with any external program see: Displaying Unicode in Powershell

$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding =  New-Object System.Text.UTF8Encoding

CodePudding user response:

Given that you're running Windows PowerShell and that you want to both read the input and create the output as UTF-8-encoded:

  • If it's acceptable to create a UTF-8 file with BOM (which is what Set-Content -Encoding utf8 in Windows PowerShell invariably creates):

    # Note the use of -Encoding utf8 in both statements.
    # (In PowerShell (Core) 7 , neither would be needed,
    # and Set-Content would create a BOM-*less* UTF-8 file;
    # you'd need -Encoding utf8BOM to create one *with* a BOM).
    
    $a = Get-Content -Encoding utf8 $path| ConvertFrom-Json
    # ...
    $a | ConvertTo-Json | Set-Content -Encoding utf8 $path
    
  • Creating a UTF-8 file without BOM requires more work in Windows PowerShell (whereas this encoding is now the consistent default in PowerShell (Core) 7 ), taking advantage of the - curious - fact that New-Item, when given a -Value argument, (invariably) creates files with that encoding:

    # (In PowerShell (Core) 7 , -Encoding utf8 wouldn't be needed,
    # and Set-Content would create a BOM-*less* UTF-8 file by default.)
    
    $a = Get-Content -Encoding utf8 $path| ConvertFrom-Json
    # ...
    New-Item -Force -Path $path -Value (($a | ConvertTo-Json)   "`r`n")
    

Note:

  • On reading: PowerShell recognizes Unicode BOMs automatically, but what encoding is assumed in the absence of a BOM depends on the PowerShell edition, both when reading source code and when reading files via cmdlets, such as via Get-Content:

    • Windows PowerShell assumes the system's legacy ANSI code page (aka language for non-Unicode programs).

    • PowerShell (Core) assumes UTF-8.

  • On writing: Once a file is read, PowerShell does not preserve information about an input file's original character encoding - the file content is stored in .NET strings (which are composed of in-memory UTF-16LE code units), even when the data is simply passed through the pipeline. As such, it is a file-writing cmdlet's own default encoding that is used if no -Encoding argument is specified, irrespective of where the data came from; specifically:

    • Windows PowerShell's Set-Content defaults to the system legacy ANSI encoding; unfortunately, other cmdlets have different defaults; notably, Out-File and its virtual alias, >, default to UTF-16LE ("Unicode") - see the bottom section of this answer for details.

    • PowerShell (Core) now fortunately defaults to BOM-less UTF-8, across all cmdlets.

  • Related