Home > database >  How to set Select-String encoding to UTF-16?
How to set Select-String encoding to UTF-16?

Time:11-08

I have a PowerShell script. It executes exe file which produces UTF-16 output and I'm piping it into Select-String like this:

& "my.exe" | Select-String -Pattern "skipping non-regular file" -NotMatch -Encoding "utf-16"

But PowerShell reports, that the encoding is not supported.

Is there a workaround to fix this? Is UTF-16 really not supported?

CodePudding user response:

Here's an example I came up with. I'm not sure how to make it work.

Program.cs:

using System;
using System.Text;

namespace myApp
{
    class Program
    {
        static void Main(string[] args)
        {
            Byte[] byteOrderMark;
            byteOrderMark = Encoding.Unicode.GetPreamble();
            //Console.OutputEncoding = new UnicodeEncoding(false, true); 
            Console.OutputEncoding = System.Text.Encoding.Unicode;
            Console.WriteLine("Hello World!");
        }
    }
}
$env:path  = ';C:\Windows\Microsoft.NET\Framework64\v4.0.30319'
csc Program.cs

# no output for either
.\Program | select-string Hello
.\Program | select-string Hello -encoding unicode


# no 'FF FE' BOM
.\program | Format-Hex


           00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000   48 00 65 00 6C 00 6C 00 6F 00 20 00 57 00 6F 00  H.e.l.l.o. .W.o.
00000010   72 00 6C 00 64 00 21 00                          r.l.d.!.

CodePudding user response:

  • Select-String's -Encoding parameter does not apply to string input from the pipeline, as is the case with output from an external program.

    • Instead, it only applies to file input - namely to the content of files passed either via via the pipeline as the output from a Get-ChildItem / Get-Item call or via the -Path / -LiteralPath parameters. -Encoding generally applies only to the content of files, across all standard cmdlets.

    • As an aside: As Jeroen Mostert notes, the error message you saw stems from the fact that utf-16 isn't a valid -Encoding argument in Windows PowerShell; (unfortunately), you must use the misnomer Unicode to refer to UTF-16LE.

      • Use Get-Help Select-String -Parameter Encoding to see the supported names or consult the docs online.
      • However, note that encoding names utf-16 and utf-16le do work in PowerShell (Core) 7 , where -Encoding additionally accepts any name or code-page number from among all the available .NET encodings, as reported by [System.Text.Encoding]::GetEncodings().
  • Instead, you must (temporarily) set [Console]::OutputEncoding to UTF-16LE ("Unicode") to get PowerShell to correctly decode the UTF-16LE output from your external program, as shown next.

$prev = [Console]::OutputEncoding # Save current value.

# Tell PowerShell to interpret external-program output as 
# UTF-16LE ("Unicode") encoded.
[Console]::OutputEncoding = [System.Text.Encoding]::Unicode

& "my.exe" |
  Select-String -Pattern "skipping non-regular file" -NotMatch

[Console]::OutputEncoding = $prev # Restore previous value.

See also:

  • For more information on how PowerShell handles character encoding when communicating with external programs, including helper functions Invoke-WithEncoding and Debug-NativeInOutput, see this answer.
  • Related