Home > Net >  char that is outside ASCII range disappearing from concatenated std::string in C
char that is outside ASCII range disappearing from concatenated std::string in C

Time:11-20

My professor has assigned an encryption algorithm for our homework in C . Instead of outputting in binary, he'd like the encrypted text (plain text that has run through the cipher) to output as a string in stdout.

The encryption algorithm will typically have an output greater than 128 (which is outside the ASCII range). These are usually replaced with symbols like � or square boxes.

When I go to concatenate these symbols to the output (ciphertext), they sometimes disappear depending on neighboring symbols.

Here's an example:

    unsigned char one = 244; // (244 is the 16-bit "output" from the algo)
    unsigned char two = 137; // (same as above)
    std::string con = "";
    con  = (one   '\0');
    con  = (two   '\0');
    std::cout << con << std::endl;

The output will be , where one of the characters is dropped.

If, however, it was unsigned char one = 244; and unsigned char two = 244;, the output in the console will be ��, so the second char doesn't vanish. I'm not sure why some of these combinations work and others don't. Is there a safer way to concatenate these characters that are outside the normal ASCII range?

I have also tried some things I've found on the site, like:

    con  = (one   '0'); 
// but this outputs the wrong text: if it were con  = (65   '0') the 
// output is 'q' instead of 'A', but all the symbols generate 
// with this
    con  = (two   '0');

I also tried the following, but it has the same results as the first (missing symbols).

    con  = one;
    con  = two;

Thank you!

CodePudding user response:

Nothing is lost, all your characters are there:

#include <iostream>
#include <string.h>
int main(int argc, char **argv)
{
    unsigned char one = 244; // (244 is the 16-bit "output" from the algo)
    unsigned char two = 137; // (same as above)
    std::string con = "";
    con  = (one   '\0');
    con  = (two   '\0');
    for (unsigned int i=0;i<strlen(con.c_str());i  )
    {
        printf("char %d = %d\n", i, (unsigned char) con.c_str()[i]);
    }
    return 0;
}

And the result:

$ g   -g -O0 main.cpp -o main
$ ./main
char 0 = 244
char 1 = 137

It is sometimes easier to inspect string as raw, c-style strings, and print their content in a format that suits you best.

CodePudding user response:

First of all, you should know that '\0' and '0' are two distinct characters having ASCII codes 0 and 48 respectively.

  • The statement con = (one '\0'); is equivalent to con = (244 0);.
  • But the statement con = (one '0'); is equivalent to con = (244 48); and 244 48 == 292 but the max value of unsigned char is 255. So it will cause an overflow and then wrap around and you'll end up with 36 (292 - 256) and 36 is for '$' character. The same is true for con = (two '0');.

I would suggest you to write something like below and it's the C way of doing it:


#include <iostream>


int main()
{
    unsigned char one = 244; // (244 is the 16-bit "output" from the algo)
    unsigned char two = 137; // (same as above)

    std::string con = "";
    con  = one;
    con  = '\0';
    con  = two;
    con  = '\0';

    std::cout << "con: <" << con << ">\n" << '\n';

    for ( size_t idx = 0; idx < con.length( );   idx )
    {
        std::cout << "index " << idx << ": <"
                  <<  static_cast<unsigned char>( con[ idx ] ) << ">" << '\n';
    /* Notice the   operator 
       besides static_cast */
    }
}

In the Windows command prompt, this gives:

con: <⌠ ë >

index 0: <244>
index 1: <0>
index 2: <137>
index 3: <0>

As you can see, each '\0' acts like a space character separating the actual data characters.

Also, notice how the operator causes a variable of type char or signed char or unsigned char to be printed as an integer. Read more about it here: How to output a character as an integer through cout?

  • Related