I have my code like:
$x = "This text needs to be encoded"
$z = [System.Text.Encoding]::Unicode.GetBytes($x)
$y = [System.Convert]::ToBase64String($z)
Write-Host("$y")
And the following gets printed to the console:
VABoAGkAcwAgAHQAZQB4AHQAIABuAGUAZQBkAHMAIAB0AG8AIABiAGUAIABlAG4AYwBvAGQAZQBkAA==
Now if I were to decode this b64 with powershell like:
$v = [System.Text.Encoding]::Unicode.GetString([System.Convert]::FromBase64String($y))
Write-Host("$v")
It would get decoded properly like:
This text needs to be encoded
However, if I was to put the aforementioned b64 encoded string to, say CyberChef and try to decode it with the "From base64" recipe, would the decoded string be filled in with dots like:
T.h.i.s. .t.e.x.t. .n.e.e.d.s. .t.o. .b.e. .e.n.c.o.d.e.d.
My question is, why does this happen?
CodePudding user response:
Santiago Squarzon has provided the crucial pointer:
What CyberChef's recipe most likely expects is for the bytes that the Base64 string encodes to be based on the UTF-8 encoding of the original string.
By contrast, the - poorly named -
[System.Text.Encoding]::Unicode
encoding is the UTF-16LE encoding, where characters are represented by (at least) two bytes (with the least significant byte coming first).- Characters whose Unicode code point is less than or equal to
0xFF
(255
), which includes the entire ASCII range that all characters in your input string fall into, therefore have aNUL
byte (value0x0
) as the second byte of their two-byte representation; e.g., the letterT
encoded as UTF-16LE is composed of the two-byte sequence0x54 0x0
, where0x54
by itself represents the letterT
in ASCII encoding - and therefore also in UTF-8, which is a superset of ASCII that represents (only) non-ASCII characters as multi-byte sequences. - Therefore, the two-byte sequence
0x54 0x0
is interpreted as two characters in the context of UTF-8: letterT
(0x54
) andNUL
(0x0
).NUL
has no visual representation per se (it is a non-printable character), but a common convention is to visualize it as.
, which is what you saw.
- Characters whose Unicode code point is less than or equal to
Therefore, create your Base64-encoded string as follows:
$orig = "This text needs to be encoded"
$base64 =
[System.Convert]::ToBase64String(
[System.Text.Encoding]::UTF8.GetBytes($orig)
)
Note: Even though [System.Text.Encoding]::UTF8
is - up to at least .NET 6 - a UTF-8 encoding with BOM, a BOM is (fortunately) not prepended to the input string by the .GetBytes()
method. As an aside: Changing this encoding to be BOM-less altogether is being considered prior to .NET 7.
$base64
then contains: VGhpcyB0ZXh0IG5lZWRzIHRvIGJlIGVuY29kZWQ=