Home > front end >  Returning a string from a function in C when handling ASCII control codes
Returning a string from a function in C when handling ASCII control codes

Time:08-05

I'm having a problem returning a string from a function. I'm writing some code in C using Visual Studio 2022 that handles characters, but if the characters are control codes, I want to return the official names such as "NUL", "ESC" etc., rather than printing out the control characters themselves, which can produce spurious output, and in the case of a line feed mess up the printed output.

A much simplified version of the main function is as follows:

#include <stdio.h>
#include <stdbool.h>
#include <stdlib.h>
#include <string.h>

typedef unsigned char BYTE;

void showASCIIContrChars(const char, char[]);

int main(void) {
  int numEls = 500;
  char conChars[4] = { '\0', '\0','\0', '\0' };
  BYTE* bufA = calloc(numEls, 1); 

  for (int i = 0; i < 500; i  ) {
    bufA[i] = i;
    showASCIIContrChars(bufA[i], conChars);
    printf("= %s\n", i, conChars);
  }
}

The function called from main is as follows:

void showASCIIContrChars(const char c, char str[]) {
  static bool first = true;
  static char** contrs = NULL;
  if (first) {
    char *contrs[35] = { "NUL", "SOH", "STX", "ETX", "EOT", "ENQ", "ACK", "BEL",
                         "BS ", "HT ", "LF ", "VT ", "FF ", "CR ", "SO ", "SI ",
                         "DEL", "DC1", "DC2", "DC3", "DC4", "NAK", "SYN", "ETB",
                         "CAN", "EM ", "SUB", "ESC", "FS ", "GS ", "RS ", "US ",
                         "SPA", "DEL", "INV"};
    first = false;
  }

  if (c < 0 || c > 127) {
    strcpy_s(str, 4, contrs[34]);
  } else if (c < 33) {
    strcpy_s(str, 4, contrs[c]); // <===
  } else if (c == 127) {
    strcpy_s(str, 4, contrs[33]);
  } else {
    str[0] = c;
    str[1] = ' ';
    str[2] = ' ';
  }
}

The actual code does rather more, but this is a simplified part of it.

What I want showASCIIContrChars() to do is to return a string with 2 or 3 characters for the control codes 0 to 31, and 127, and for convenience return "SPA" for a space. "I've also included "INV" for invalid for codes outside the range 0 to 127. For codes between 33 and 126, which are the normal printable characters other than a space (code 32), I just want to return that character. In all cases any trailing spaces in the returned string are padded if needed, and terminated by a NUL, '\0' character.

The code compiles without errors, but on attempting to run it, an exception is thrown at the line indicated by "<==" with an error message: .... 0x000000C: Access violation reading location 0x followed by a long strings of 0s. Unfortunately copy and paste doesn't work.

Incidentally I used typedef for an unsigned char as BYTE, but this should not have any effect. In many of my codes I use this so that all bytes have non-negative values, as in many cases the bytes do not represent characters, but the do here.

There must be a simple explanation for this error and fix it. I look forward to some help on this.

CodePudding user response:

You are misunderstanding static and variable scope, and also over-thinking it. What you've done is define a static char** that is NULL, and never changes.

When you enter your "first" section, you're defining a new variable with the same name that only exists in that scope. Enabling all compiler warnings should inform you that the variable is unused.

As a result, even though you thought you were creating the table on first call, it was always NULL and so when you tried to access it you got a crash.

To fix this and simplify your function you just initialize the static table on definition. The compiler will only initialize it once, because it's static. You can also avoid the need for testing negative values by making the first parameter unsigned.

void showASCIIContrChars(unsigned char c, char str[4])
{
    static const char* contrs[33] = {
        "NUL", "SOH", "STX", "ETX", "EOT", "ENQ", "ACK", "BEL",
        "BS ", "HT ", "LF ", "VT ", "FF ", "CR ", "SO ", "SI ",
        "DEL", "DC1", "DC2", "DC3", "DC4", "NAK", "SYN", "ETB",
        "CAN", "EM ", "SUB", "ESC", "FS ", "GS ", "RS ", "US ",
        "SPA"
    };

    if (c < 33) {
        strcpy(str, contrs[c]);
    } else if (c == 127) {
        strcpy(str, "DEL");
    } else if (c > 127) {
        strcpy(str, "INV");
    } else {
        str[0] = c;
        str[1] = ' ';
        str[2] = ' ';
        str[3] = '\0';
    }
}

Note I've also indicated the size of the str as a parameter. Although this is not necessarily enforced by the compiler, it at least indicates to the caller that they need a buffer this size. And compilers can issue warnings when you supply a buffer that is not large enough.

To make the table a bit clearer, only the consecutive values are stored there, and string literals are used for the other cases (DEL and INV).

And to be a good citizen, we also NUL-terminate the string for all cases, meaning the caller does not need to initialize the buffer.


An alternative approach is to generate the entire table once, which means you don't need any special logic when querying it:

struct ASCIITable
{
    char name[256][4];
};

const struct ASCIITable* buildASCIITable()
{
    static struct ASCIITable table = {
        "NUL", "SOH", "STX", "ETX", "EOT", "ENQ", "ACK", "BEL",
        "BS ", "HT ", "LF ", "VT ", "FF ", "CR ", "SO ", "SI ",
        "DEL", "DC1", "DC2", "DC3", "DC4", "NAK", "SYN", "ETB",
        "CAN", "EM ", "SUB", "ESC", "FS ", "GS ", "RS ", "US ",
        "SPA"
    };

    if (*table.name[33] == '\0')
    {
        for (int i = 33; i < 127; i  )
        {
            table.name[i][0] = i;
            table.name[i][1] = ' ';
            table.name[i][2] = ' ';
            table.name[i][3] = '\0';
        }
        strcpy(table.name[127], "DEL");
        for (int i = 128; i < 256; i  )
        {
            strcpy(table.name[i], "INV");
        }
    }

    return &table;
}

Then your function becomes:

void showASCIIContrChars(unsigned char c, char str[4])
{
    static const struct ASCIITable* table = NULL;
    if (!table) table = buildASCIITable();
    strcpy(str, table->name[(int)c & 0xff]);
}

Or you can query the value without doing any copies:

const char* getASCIIName(unsigned char c)
{
    static const struct ASCIITable* table = NULL;
    if (!table) table = buildASCIITable();
    return table->name[(int)c & 0xff];
}

int main(void)
{
    for (int i = 0; i <= 128; i  )
    {
        printf("= %s\n", i, getASCIIName(i));
    }
}

If you really want your original function, it can now call the other so you don't have multiple tables kicking around.

void showASCIIContrChars(unsigned char c, char str[4])
{
   strcpy(str, getASCIIName(c));
}

CodePudding user response:

  1. constrs[] was being redeclared in an attempt to lazy initialize it. static variables are initialized at compile time so there was no point anyways.
  2. main() seemed overly complicated. Also, ASCII is 7 bit (2^7) so not sure why you were driving it to 500.
  3. It's implementation defined if char is signed or unsigned so changing signature to make it unsigned. Also, made it non-cost as arguments are passed by value.
  4. You should ensure string is zero terminated (instead of relying on caller doing it for you).
  5. It is confusing that tag on the two unrelated symbols onto to end of that array so I treat them as special cases.
  6. main() should return an int.
#include <stdio.h>
#include <string.h>

void showASCIIContrChars(unsigned char c, char str[]) {
    const char *contrs[] = {
        "NUL" , "SOH" , "STX", "ETX", "EOT", "ENQ", "ACK", "BEL",
        "BS ", "HT ", "LF ", "VT ", "FF ", "CR ", "SO ", "SI ",
        "DEL", "DC1", "DC2", "DC3", "DC4", "NAK", "SYN", "ETB",
        "CAN", "EM ", "SUB", "ESC", "FS ", "GS ", "RS ", "US ",
        "SPA"
    };
    if(c < sizeof(contrs) / sizeof(*contrs)) {
        strcpy(str, contrs[c]);
    } else if(c < 127) {
        str[0] = c;
        str[1] = ' ';
        str[2] = ' ';
        str[3] = '\0';
    } else if(c == 127) {
        strcpy(str, "DEL");
    } else {
        strcpy(str, "INV");
    }
}

int main(void) {
    for (int i = 0; i < 128; i  ) {
        char s[4];
        showASCIIContrChars(i, s);
        printf("= %s\n", i, s);
    }
    return 0;
}

CodePudding user response:

I would avoid the whole issue of copying strings into buffers.

This only returns pointers to immutable strings for control characters, and a NULL pointer for any non-control characters.

Then the caller can produce different output formatting for control characters and non-control characters.

#include <stdio.h>

const char *getASCIIControlChar(const char c)
{
  const char *const control_chars[] = {
    "NUL", "SOH", "STX", "ETX", "EOT", "ENQ", "ACK", "BEL",
    "BS ", "HT ", "LF ", "VT ", "FF ", "CR ", "SO ", "SI ",
    "DEL", "DC1", "DC2", "DC3", "DC4", "NAK", "SYN", "ETB",
    "CAN", "EM ", "SUB", "ESC", "FS ", "GS ", "RS ", "US ",
    "SPA"
  };
  if ((0 <= c) && (c <= 32)) {
    return control_chars[(unsigned int) c];
  } else if (c == 127) {
    return "DEL";
  } else {
    return NULL;
  }
}

int main(void)
{
  for (int i=0; i<128; i  ) {
    const char        c = (char) i;
    const char *const s = getASCIIControlChar(c);
    printf("= %s\n", i, s?s:"INV");
  }
  return 0;
}
...
125 INV
126 INV
127 DEL

Moving distinguishing between control characters and non-control characters to the caller even allows output formatting involving the actual non-control characters:

  for (int i=0; i<128; i  ) {
    const char        c = (char) i;
    const char *const s = getASCIIControlChar(c);
    if (s) {
      printf("= %s\n", i, s);
    } else {
      printf("= '%c'\n", i, c);
    }
  }
...
125 '}'
126 '~'
127 DEL
  •  Tags:  
  • c
  • Related