Home > Software design >  How to properly convert between a UTF-8 buffer and a common Delphi string?
How to properly convert between a UTF-8 buffer and a common Delphi string?

Time:11-19

I prepared some code snippet from Delphi recommendations, no compiler warnings or implicit casts but the result unsatisfied me.

procedure Convert;
type
  TUTF8Buf = array [0 .. 5] of byte;
var
  s: string;
  sutf8: UTF8String; // manageable UTF-8 string
  utf8str: TUTF8Buf; // unmanageable buffer
begin
  utf8str := Default (TUTF8Buf); // utf8str = (0,0,0,0,0,0)
  s := UTF8ArrayToString(utf8str); // s = #0#0#0#0#0#0
  s := 'abc'; // s = 'abc'
  sutf8 := UTF8Encode(s); // sutf8 = 'abc'
  Move(sutf8[1], utf8str[0], Min(Length(sutf8), sizeof(utf8str) - 1)); // utf8str = (97, 98, 99, 0, 0)
  s := UTF8ArrayToString(utf8str); // s = 'abc'#0#0#0
  s := UTF8ToString(sutf8); // s = 'abc'
end;

The code works perfectly fine when it is used with the manageable UTF-8 string but always produces trailing zeroes with the unmanageable buffer. What is the proper modern way of handling such buffers?

CodePudding user response:

Use TStringHelper.TrimRight to remove trailing null bytes.

s := s.TrimRight([#0]);

The System.UTF8ToString method also has an overload that accepts an array of Byte. Try this one. It might automatically stop at the first null byte it encounters.

s := UTF8ToString(utf8str);

CodePudding user response:

Your UTF8ArrayToString() function is clearly converting the entire array as a whole, it is not stopping if a $0 byte is encountered. If you are able to, you should alter UTF8ArrayToString() to to add an optional parameter to specify how many bytes in the array should be converted.

That said, the simplest way to deal with UTF-8 is to just use UTF8String by itself. The RTL knows how to implicitly convert between UnicodeString and UTF8String, let it do the work for you. You don't need UTF8Encode() and UTF8Decode(), as they have been deprecated since 2009.

If you must work with byte arrays, then you should use TEncoding.UTF8.GeyString() and TEncoding.UTF8.GetBytes() for any conversions.

  • Related