I'm currently learning C and lately, I have been focusing on the topic of character encoding. Note that I'm a Windows programmer. While I currently test my code only on Windows, I want to eventually port it to Linux and macOS, so I'm trying to learn the best practices right now.
In the example below, I store a file path in a wchar_t
variable to be opened later on with _wfopen
. I need to use _wfopen
because my file path may contain chars not in my default codepage. Afterwards, the file path and a text literal is stored inside a char
variable named message
for further use. My understanding is that you can store a wide string into a multibyte string with the %ls
modifier.
char message[8094] = "";
wchar_t file_path[4096] = L"C:\\test\\test.html";
sprintf(message, "Accessing: %ls\n", file_path);
While the code works, GCC/MinGW outputs the following warning and notes:
warning: '%ls' directive writing up to 49146 bytes into a region of size 8083 [-Wformat-overflow=]|
note: assuming directive output of 16382 bytes|
note: 'sprintf' output between 13 and 49159 bytes into a destination of size 8094|
My issue is that I simply do not understand how sprintf
could output up to 49159 bytes into the message
variable. I output the Accessing:
string literal, the file_path
variable, the \n
char and the \0
char. What else is there to output?
Sure, I could declare message
as a wchar_t
variable and use wsprintf
instead of sprintf
, but my understanding is that wchar_t
does not make up for nice portable code. As such, I'm trying to avoid using it unless it's required by a specific API.
So, what am I missing?
CodePudding user response:
The warning doesn't take into account the actual contents of file_path
, it is calculated based on file_path
having any possible content . There would be an overflow if file_path
consisted of 4095 emoji and a null terminator.
Using %ls
in narrow printf
family converts the source to multi-byte characters which could be several bytes for each wide character.
To avoid this warning you could:
- disable it with
-Wno-format-overflow
- use
snprintf
instead ofsprintf
The latter is always a good idea IMHO, it is always good to have a second line of defence against mistakes introduced in code maintenance later (e.g. someone comes along and changes the code to grab a path from user input instead of hardcoded value).
After-word. Be very careful using wide characters and printf
family in MinGW , which implements the printf
family by calling MSVCRT which does not follow the C Standard. Further reading
To get closer to standard behaviour, use a build of MinGW-w64 which attempts to implement stdio library functions itself, instead of deferring to MSVCRT. (E.g. MSYS2 build).