I prepared some code snippet from Delphi recommendations, no compiler warnings or implicit casts but the result unsatisfied me.
procedure Convert;
type
TUTF8Buf = array [0 .. 5] of byte;
var
s: string;
sutf8: UTF8String; // manageable UTF-8 string
utf8str: TUTF8Buf; // unmanageable buffer
begin
utf8str := Default (TUTF8Buf); // utf8str = (0,0,0,0,0,0)
s := UTF8ArrayToString(utf8str); // s = #0#0#0#0#0#0
s := 'abc'; // s = 'abc'
sutf8 := UTF8Encode(s); // sutf8 = 'abc'
Move(sutf8[1], utf8str[0], Min(Length(sutf8), sizeof(utf8str) - 1)); // utf8str = (97, 98, 99, 0, 0)
s := UTF8ArrayToString(utf8str); // s = 'abc'#0#0#0
s := UTF8ToString(sutf8); // s = 'abc'
end;
The code works perfectly fine when it is used with the manageable UTF-8 string but always produces trailing zeroes with the unmanageable buffer. What is the proper modern way of handling such buffers?
CodePudding user response:
Use TStringHelper.TrimRight
to remove trailing null bytes.
s := s.TrimRight([#0]);
The System.UTF8ToString
method also has an overload that accepts an array of Byte. Try this one. It might automatically stop at the first null byte it encounters.
s := UTF8ToString(utf8str);
CodePudding user response:
Your UTF8ArrayToString()
function is clearly converting the entire array as a whole, it is not stopping if a $0
byte is encountered. If you are able to, you should alter UTF8ArrayToString()
to to add an optional parameter to specify how many bytes in the array should be converted.
That said, the simplest way to deal with UTF-8 is to just use UTF8String
by itself. The RTL knows how to implicitly convert between UnicodeString
and UTF8String
, let it do the work for you. You don't need UTF8Encode()
and UTF8Decode()
, as they have been deprecated since 2009.
If you must work with byte arrays, then you should use TEncoding.UTF8.GeyString()
and TEncoding.UTF8.GetBytes()
for any conversions.