How to create an un-escaped string of hex bytes in C#-CodePudding

I am working with a C API from my C# codebase and am having issues with Cyrillic characters in a file path.

I am trying to call the wrapped C function which should load an object from the file. The C function signature looks like this:

GetModelFromFileCpp(ModelRefType * model, const char * file_path)

This function is wrapped within my C# library as so:

[DllImport("CPPLibrary.dll", EntryPoint="GetModelFromFileCpp", CallingConvention=CallingConvention.Cdecl)]
public static extern ResultType GetModelFromFileCs(out IntPtr model, string file_path)

I don't have access to the C library to see what is going on inside. I should add that the documentation for the C library mentions that this function is expecting that the file path is UTF-8 encoded.

The problem appears when I pass in a string to represent the absolute path of the file, and the string contains Cyrillic characters. I've also seen this issue with Japanese characters. The string, for example, could be "C:\Users\UserDirWithäöChar\App Data\Local\Temp", where the Windows User name contains these characters.

The fact that the User's name contains these characters is important, because my code is generating a temp file which is placed in the \AppData\Local\Temp which does not seem friendly toward copying and placing elsewhere unless I'm debugging in Admin mode. So, it seems as though I'm forced to use a path which contain these characters.

I created the following script to test out the string encoding.

string path = @"C:\Users\UserDirWithäöChar\App Data\Local\Temp";
byte[] b = Encoding.UTF8.GetBytes(path);
string r = String.Empty;
foreach(byte bite in b)
{
   r  = (@"\x"   String.Format("{0:X2}", bite));
}
var result = r.ToCharArray();
Console.WriteLine(result);

Result: \x43\x3A\x5C\x55\x73\x65\x72\x73\x5C\x55\x73\x65\x72\x44\x69\x72\x57\x69\x74\x68\xC3\xA4\xC3\xB6\x43\x68\x61\x72\x5C\x41\x70\x70\x20\x44\x61\x74\x61\x5C\x4C\x6F\x63\x61\x6C\x5C\x54\x65\x6D\x70

I found that when I copy this output directly and paste it into the wrapper function during a debugger session (using the immediate window in Visual Studio), that the C library function gives me the correct result.

If I pass in the variable storing this value, however, I see that a serialization error has occurred and that the file can not be read (or found). It appears that when this string is set into a string variable, the characters are being transformed from these hexidecimal representations to their actual characters which includes the escaped backslash: "\\x43\\x3A\\x5C\\x55\\x73\\x65\\x72\\x73\\x5C\\x55\\x73\\x65\\x72\\x44\\x69\\x72\\x57\\x69\\x74\\x68\\xC3\\xA4\\xC3\\xB6\\x43\\x68\\x61\\x72\\x5C\\x41\\x70\\x70\\x20\\x44\\x61\\x74\\x61\\x5C\\x4C\\x6F\\x63\\x61\\x6C\\x5C\\x54\\x65\\x6D\\x70".

An interesting thing I found is that the size of the first string is 4x less than the size of the last string (the one with escaped backslashes), but both of these evaluate to type String.

What is the difference between these two, and is it possible to set a variable with the exact value of the first string (no extra backslashes)?

EDIT: Here is my full implementation

public static Model CreateModelFromFile(string path)
{
   byte[] bytes = Encoding.UTF8.GetBytes(path);

   string encodedPath = String.Empty;

   foreach(byte b in bytes)
   {
       encodedPath  = (@"\x"   String.Format("{0:X2}", b));
   }

   IntPtr model_ref;

   if (ModelAPI.CreateFromFileWithStatus(out model_ref, 
       encodedPath)
      {
         return new Model(model_ref);
      {

   return null;
}

Also, I have tried using adding @ to the beginning of the string literal during debugging but this didn't work.

The string is passed in during a process several levels above this-- retrieved from the file system using System.IO.Path.GetTempFileName().

CodePudding user response：

I haven't found an answer for why these two strings are different, but I found that passing in a byte array instead of a string works.

GetModelFromFileCs(out model, Encoding.UTF8.GetBytes(file_path));

To do this I modified the wrapper to allow a byte[] argument for the file_name instead of using a string type. This is contradictory to how the rest of the wrapper is implemented, which is why I didn't think to initially approach this problem in this way.

CodePudding user response：

What you are doing to pass UTF8 simply isn't how UTF8 is supposed to be represented. That's just the way you write it in C# code as a literal string. What you show in your debugger is just a hex representation of the actual bytes.

In actual UTF8 strings, each character occupies a single byte, or multiple if using characters above U 007F.

Unfortunately, there is no automatic marshalling for UTF8, only for ANSI or UTF16.

You need to pass this as a null-terminated array of bytes

[DllImport("CPPLibrary.dll", CallingConvention = CallingConvention.Cdecl)]
public static extern ResultType GetModelFromFileCpp(out IntPtr model, byte[] file_path)

public static Model CreateModelFromFile(string path)
{
    byte[] pathAsBytes = Encoding.UTF8.GetBytes(path   '\0');

    if (ModelAPI.CreateFromFileWithStatus(out var model_ref, pathAsBytes))
    {
        return new Model(model_ref);
    }

    return null;
}