Home > Blockchain >  Can't get file size when path contains UTF-8 character in C
Can't get file size when path contains UTF-8 character in C

Time:12-08

long int getFileSize(const wchar_t* path)
{   
    long int res = -1;
    
    char file_name[MAX_PATH];
    wcstombs(file_name, path, MAX_PATH);

    FILE* fp = fopen(file_name, "r");

    if (fp != NULL) {
        fseek(fp, 0L, SEEK_END);
        res = ftell(fp);
        fclose(fp);
    }
          
    return res;
}

This is how I get the file size, but if there is a Turkish character in the file path, like (ğİı), the result is -1.

As a result of my research, I learned that if I use the "setlocale", my problem will be fixed. But this causes other errors in my project.

FILE* fp = _wfopen(path, L"w,ccs=UTF-8");  << I tried this but not works

CodePudding user response:

The char apis on Windows all use the "current code page" for string encoding. On USA computers, this is usually code page 1252 encoding. These encodings usually do not contain most unicode characters, and so it is impossible to convert most unicode strings to the current encoding. For instance, Windows 1252 doesn't contain an encoding for Turkish ğİı. When you pass those characters to wcstombs, it stops when it encounters any character it can't convert, and returns -1 to signal failure, but your code isn't checking if the conversion succeeded. To guarantee success, all char apis must be limited to only using characters that can be encoded by the current code page. Since each computer might use a different code page, you're effectively limited to only using characters that can be encoded in every code page, which is basically ASCII.

Since you clearly want to interact with apis using Turkish characters, your options are either to change the current codepage to one that contains those characters (Code page 857 (Turkish) or Code page 65001 (UTF8)) and make sure every single char string in your app is encoded with that code page, or you can use wchar apis, which always use the UTF16 encoding on Windows. C doesn't offer any wchar apis to my knowledge, so you'll have to use C apis, or other native windows Apis.

You probably want to use FindFirstFileExW and GetFileInformationByHandleEx.

CodePudding user response:

Thanks everyone, I solved like this

long int getFileSize(const wchar_t* path)
{
    long int res = -1;

    wchar_t file_name[MAX_PATH];
    wcscpy(file_name, path);

    FILE *fp;
    fp = _wfopen(file_name, L"r");
    if (fp != NULL) {
        fseek(fp, 0L, SEEK_END);
        res = ftell(fp);
        fclose(fp);
    }

    return res;
}
  •  Tags:  
  • c
  • Related