Home > Software engineering >  How to convert a file that seem binary encoded to utf8 encoding with powershell?
How to convert a file that seem binary encoded to utf8 encoding with powershell?

Time:11-09

Goal : I try to convert a file without extension to a readable json encoded in utf8 in powershell.

Context : Any pbix file can be "unzipped" by extracting some files in it. Right click select "Extract to...". It will extract a bunch of files and one folder Report. In the Report folder there is one file called Layout. This file seem to be a json but don't have the json extension. My ultimate goal is to exploit this json, so I try to do it by making a powershell script. But the file is hard to use since I don't really know how to work with this type of file.

My trials : First I try to copy the file and change the extension. The json file seem to be okay, but the file doesn't seem to have extension according to Notepad . And then when I try to convert the content of the file to json it seem not work.

Copy-Item -Path "\Layout" -Destination "\Layout.json"
Get-Content -Raw "\Layout.json" | Out-String | ConvertFrom-Json

Here's the error message :

  ... \Layout.JSON" | Out-String | ConvertFrom-Json
                                   ~~~~~~~~~~~~~~~~
      CategoryInfo          : NotSpecified: (:) [ConvertFrom-Json], ArgumentException
      FullyQualifiedErrorId : System.ArgumentException,Microsoft.PowerShell.Commands.ConvertFromJsonCommand

Then I try manually to convert it in UTF8 with notepad and it worked ! So it's an encoding problem. So I tried to get the content of the file without extension and THEN I copy the content into a fresh new json file encoded in UTF8.

$MyPath = "\pbi_json.json"
$MyRawString = Get-Content -Raw "\Layout"
Set-Content -Path $MyPath -Value $MyRawString -Encoding UTF8

But it seem to not work well, because the destination file seem parasited with NUL character (it's the first character of the ASCII table). Between each character there is a NUL character. And when I check the encoding in Notepad it seem to be UTF8BOM instead of UTF8.

So is it possible to get a way to bypass the encoding conversion or a way to solve this conversion problem ? Do forget that the main goal is about creating a json object from the file without extension, so I can take any solution that could help me, the only condition is that this solution shouldn't require an external library.

CodePudding user response:

The output file having embedded NUL characters between every other character is a symptom of the input file being encoded in UTF-16 (propably without BOM, so Get-Content couldn't detect it).

You can force Get-Content to use the UTF-16 (Little Endian) encoding by passing -Encoding Unicode:

$json = Get-Content -Raw "\Layout" -Encoding Unicode | ConvertFrom-Json
  • Related