Home > Mobile >  Marshalling utf8 encoded chinese characters from C# to C
Marshalling utf8 encoded chinese characters from C# to C

Time:12-06

I'm marshaling some Chinese characters which have the decimal representation (utf8) as

228,184,145,230,161,148

however when I receive this in C I end up with the chars

-77,-13,-67,-37

I can solve this using a sbyte[] instead of string in c#, but now I'm trying to marshal a string[] so I can't use this method. Anyone have an idea as to why this is happening?

EDIT: more detailed code:

C#

[DllImport("mydll.dll",CallingConvention=CallingConvention.Cdecl)]
static extern IntPtr inputFiles(IntPtr pAlzObj, string[] filePaths, int fileNum);

string[] allfiles = Directory.GetFiles("myfolder", "*.jpg", SearchOption.AllDirectories);
string[] allFilesutf8 = allfiles.Select(i => Encoding.UTF8.GetString(Encoding.Default.GetBytes(i))).ToArray();
IntPtr pRet = inputFiles(pObj, testUtf8, testUtf8.Length);

C

extern __declspec(dllexport) char* inputFiles(Alz* pObj, char** filePaths, int fileNum);

char* massAdd(Alz* pObj, char** filePaths, int fileNum)
{
    if (pObj != NULL) {
        try{
            std::vector<const char*> imgPaths;
            for (int i = 0; i < fileNum; i  )
            {
                char* s = *(filePaths   i);
                //Here I would print out the string and the result in bytes (decimals representation) are already different.
                imgPaths.push_back(s);
            }

            string ret = pAlzObj->myfunc(imgPaths);
            const char* retTemp = ret.c_str();
            char* retChar = _strdup(retTemp);
            return retChar;
        }
        catch (const std::runtime_error& e) {
            cout << "some runtime error " << e.what() << endl;
        }
    }
}

Also, something I found is that if I change the windows universal encoding (In language settings) to use unicode UTF-8, it works fine. Not sure why though.

When marshaling to unsigned char* (or unsigned char** as it's an array) I end up with another output, which is literally just 256 the nummbers shown when in char. 179,243,189,219. This leads me to believe there is something happening during marshaling rather than a conversion mistake on the C side of things.

CodePudding user response:

That is because C strings uses standard char when stored. The char type is indeed signed and that makes those values being interpreted as negative ones.

I guess that traits may be handled inside the <xstring> header on windows (as far as I know). Specifically in:

_STD_BEGIN
template <class _Elem, class _Int_type>
struct _Char_traits { // properties of a string or stream element
    using char_type  = _Elem;
    using int_type   = _Int_type;
    using pos_type   = streampos;
    using off_type   = streamoff;
    using state_type = _Mbstatet;
#if _HAS_CXX20
    using comparison_category = strong_ordering;
#endif // _HAS_CXX20

CodePudding user response:

I have some ideas: You solve problem by using a sbyte[] instead of string in c#, and now you are trying to marshal a string[], just use List<sbyte[]> for string array. I am not experienced with c but I guess there are another libraries for strings use one of them. Look this link, link show string types can marshalling to c#. https://learn.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.unmanagedtype?view=net-7.0

  • Related