Home > front end >  Why does PowerShell generated base64 string have dots in it when decoding with something else than P
Why does PowerShell generated base64 string have dots in it when decoding with something else than P

Time:10-20

I have my code like:

$x = "This text needs to be encoded"
$z = [System.Text.Encoding]::Unicode.GetBytes($x)
$y = [System.Convert]::ToBase64String($z)
Write-Host("$y")

And the following gets printed to the console:

VABoAGkAcwAgAHQAZQB4AHQAIABuAGUAZQBkAHMAIAB0AG8AIABiAGUAIABlAG4AYwBvAGQAZQBkAA==

Now if I were to decode this b64 with powershell like:

$v = [System.Text.Encoding]::Unicode.GetString([System.Convert]::FromBase64String($y))
Write-Host("$v")

It would get decoded properly like:

This text needs to be encoded

However, if I was to put the aforementioned b64 encoded string to, say CyberChef and try to decode it with the "From base64" recipe, would the decoded string be filled in with dots like:

T.h.i.s. .t.e.x.t. .n.e.e.d.s. .t.o. .b.e. .e.n.c.o.d.e.d.

My question is, why does this happen?

CodePudding user response:

Santiago Squarzon has provided the crucial pointer:

  • What CyberChef's recipe most likely expects is for the bytes that the Base64 string encodes to be based on the UTF-8 encoding of the original string.

  • By contrast, the - poorly named - [System.Text.Encoding]::Unicode encoding is the UTF-16LE encoding, where characters are represented by (at least) two bytes (with the least significant byte coming first).

    • Characters whose Unicode code point is less than or equal to 0xFF (255), which includes the entire ASCII range that all characters in your input string fall into, therefore have a NUL byte (value 0x0) as the second byte of their two-byte representation; e.g., the letter T encoded as UTF-16LE is composed of the two-byte sequence 0x54 0x0, where 0x54 by itself represents the letter T in ASCII encoding - and therefore also in UTF-8, which is a superset of ASCII that represents (only) non-ASCII characters as multi-byte sequences.
    • Therefore, the two-byte sequence 0x54 0x0 is interpreted as two characters in the context of UTF-8: letter T (0x54) and NUL (0x0). NUL has no visual representation per se (it is a non-printable character), but a common convention is to visualize it as ., which is what you saw.

Therefore, create your Base64-encoded string as follows:

$orig = "This text needs to be encoded"
$base64 = 
  [System.Convert]::ToBase64String(
    [System.Text.Encoding]::UTF8.GetBytes($orig)
  )

Note: Even though [System.Text.Encoding]::UTF8 is - up to at least .NET 6 - a UTF-8 encoding with BOM, a BOM is (fortunately) not prepended to the input string by the .GetBytes() method. As an aside: Changing this encoding to be BOM-less altogether is being considered prior to .NET 7.

$base64 then contains: VGhpcyB0ZXh0IG5lZWRzIHRvIGJlIGVuY29kZWQ=

  • Related