Home > database >  How to do direct assignment of string with a special character in C
How to do direct assignment of string with a special character in C

Time:02-22

I find it quite useful that I can do this in C:

char *text;
text = "5 us";

However, when I try to do this

char *text;
text = "5 µs";

I get an extraneous character due to the source code encoding in UTF-8 in Eclipse (using CubeIDE). So the string looks like this in byte form:

0x35 0x20 0xC2 0xB5 0x73 0x00

I need the 0xC2 removed and I don't want to write a function to remove this character. I know I can configure Eclipse to handle my source code in US-ASCII. However, then I cannot save my file anymore because of the given assignment

text = "5 µs";

Eclipse won't save my file unless I remove the µ in my source code.

Is there maybe something like the following?

text = {'5', ' ', 181, 's', '0');

I just don't want to go through the hoops of creating a global string the pedestrian way. I want to preserve the elegance of a direct assignment.

Sorry if I don't use the proper C terms but I think you get the gist.

CodePudding user response:

µ does not exist in the ASCII character set.

There are many single-byte encodings extending ASCII with µ mapped to 0xB5, for example ISO-8859-1, ISO-8859-3, ISO-8859-8, Windows-1252, and so on.

It is not clear which one you'd want, but most likely either ISO-8859-1, aka latin1, or Windows-1252. Have a look at the Wikipedia pages for the encodings if you are unsure.

If the issue is only that the strings are not printed correctly, the best solution would be to set the outputting device to UTF-8.

If not possible, you can tell the compiler the execution character set you want, which is the character set to which string literals are translated.

GCC defaults the execution character set to UTF-8, but that can be changed with the -fexec-charset= flag (with one of the encodings given above as argument). That does however have global effects and will mess up output on devices expecting a different encoding, such as UTF-8.

Also note that the encoding of the source file is not relevant, nor does the result change if you use a universal character name. The characters in the string literal are always translated to the execution character set.

  • Related