I found this excellent approach on shortening GUIDs here on stackowerflow: .NET Short Unique Identifier
I have some other strings that I wanted to treat the same way, but I found out that in most cases the Base64String is even longer than the original string.
My question is: why does [guid]::NewGuid().ToByteArray()
return a significant smaller byte array than [System.Text.Encoding]::UTF8.GetBytes([guid]::NewGuid().Guid)
?
For example, let's look at the following GUID:
$guid = [guid]::NewGuid()
$guid
Guid
----
34c2b21e-18c3-46e7-bc76-966ae6aa06bc
With $guid.GetBytes()
, the following is returned:
30
178
194
52
195
24
231
70
188
118
150
106
230
170
6
188
And [System.Convert]::ToBase64String($guid.ToByteArray())
generates HrLCNMMY50a8dpZq5qoGvA==
[System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes($guid.Guid))
, however, returns MzRjMmIyMWUtMThjMy00NmU3LWJjNzYtOTY2YWU2YWEwNmJj
, with [System.Text.Encoding]::UTF8.GetBytes($guid.Guid)
being:
51
52
99
50
98
50
49
101
45
49
56
99
51
45
52
54
101
55
45
98
99
55
54
45
57
54
54
97
101
54
97
97
48
54
98
99
CodePudding user response:
[guid]::NewGuid().ToByteArray()
A GUID is a 128-bit number. When converting it into a byte array, you divide 128 by 8 (bits per byte) and get an array of 16 bytes.
[System.Text.Encoding]::UTF8.GetBytes([guid]::NewGuid().Guid)
This converts the GUID to a hexadecimal string representation first. Then this string gets encoded as UTF-8.
A hex string uses two characters per byte (one hex digit for the lower and one for the upper 4 bits). So we need at least 32 characters (16 bytes of GUID multiplied by 2). When converted to UTF-8 each character relates to exactly one byte, because all hex digits as well as the dash are in the basic ASCII range which maps 1:1 to UTF-8. So including the dashes we end up with 32 4 = 36 bytes.
So this is what [System.Convert]::ToBase64String()
has to work with - 16 bytes of input in the first case and 36 bytes in the second case.
Each Base64 output digit represents up to 6 input bits.
16 bytes = 128 bits, divided by 6 = 22 (rounded up)
36 bytes = 288 bits, divided by 6 = 48
So that's why you end up with more than twice the number of Base64 characters when converting GUID to hex string first.
CodePudding user response:
The GUID struct is an object storing a 16 byte array that contains its value.
These are the 16 bytes you see when you perform its method .ToByteArray()
method.
The 'normal' string representation is a grouped series of these bytes in hexadecimal format. (4-2-2-2-6)
As for converting to Base64, this will always return a longer string because each Base64 digit represents exactly 6 bits of data.
Therefore, every three 8-bits bytes of the input (3×8 bits = 24 bits) can be represented by four 6-bit Base64 digits (4×6 = 24 bits).
The resulting string can even be extended with =
padding characters at the end of the string to always be a multiple of 4.
The result is a string of [math]::Ceiling(<original size> / 3) * 4
length.
Using [System.Text.Encoding]::UTF8.GetBytes([guid]::NewGuid().Guid)
is actually first performing the GUID's .ToString()
method and from that string it will return the ascii values of each character in there.
(hexadecimal representation = 2 characters per byte = 32 values the four dashes in it leaves a 36-byte array)