Home > Software engineering >  How do I write a C function that takes a reference to either a 'char *' or a 'const
How do I write a C function that takes a reference to either a 'char *' or a 'const

Time:10-14

I am writing a function that extracts unicode characters from a string one at a time. The argument is reference to a pointer to a char which the function increments to the next character prior to returning a value. Here is the entire function:

uint16_t get_char_and_inc(const char *&c) {
  uint16_t val = *c  ;
  if ((val & 0xC0) == 0xC0)
    while ((*c & 0xC0) == 0x80)
      val = (val << 8) | *c  ;
  return val;
}

As many have pointed out, this UTF-8 decoder is not technically correct, it is limited to 16-bits codes and it does not remove the encoding bits, but it is sufficient for my limited graphics library for microcontrollers :)

The complexity of this function is irrelevant to the question, so assume it simply is this:

uint16_t get_utf8_char_and_inc(const char *&c) {
  return *c  ;
}

The problem I am having is that I would like it to work for both char * and const char*, i.e.:

void main() {
  const char cc[] = "ab";
  get_char_and_inc(cc);
  printf(cc);
  
  char c[] = "ab";
  get_char_and_inc(c); // This does not compile
  printf(c);
}

Expected output:

b
b

However, the second call gives me the error:

invalid initialization of non-const reference of type 'const char*&' from an rvalue of type 'const char*'

There are several questions on stackoverflow about this particular error message. Usually they regard passing a const char* as a char *, which is illegal. But in this case, I am going from a char * to a const char*. I feel like this should be legal as I am simply adding a guarantee not to modify the data in the function.

Reading through other answers, it appears the compiler makes a copy of the pointer, making it into a temporary r-value. I understand why this may be necessary in non-trivial conversions, but it seems like here it should not be necessary at all. In fact, if I drop the "&" from the function signature, it compiles just fine, but of course, then the pointers passed by value and the program prints "ab" instead of "b".

Currently, to make this work, I have to have the function twice, one taking const char *&c and another taking char *&c. This seems inefficient to me as the code is exactly the same. Is there any way to avoid the duplication?

CodePudding user response:

char* and const char* are not the same type, and you can't mix types in a reference, it has to be an exact match. That is why you can't pass a char* pointer, or a char[] array, or a const char[] array, etc to a const char*& reference. They simply do not match the type expected.

In this case, to make get_char_and_inc() be a single function that can handles multiple reference types, make it a template function, eg:

template<typename T>
uint16_t get_char_and_inc(T* &c) {
  return *c  ;
}

int main()
{
  const char *cc = "ab";
  printf("%p\n", cc);
  get_char_and_inc(cc); // deduces T = const char
  printf("%p\n", cc); // shows cc has been incremented
  
  char c[] = "ab";
  char *p = c;
  printf("%p\n", p);
  get_char_and_inc(p); // deduces T = char
  printf("%p\n", p); // shows p has been incremented

  return 0;
}

Online Demo

CodePudding user response:

If you're worried about the program size you can add a static inline overload like this:

uint16_t get_char_and_inc(const char *&c);

static inline uint16_t get_char_and_inc(char *&c) {
    const char *cc = c;
    uint16_t r = get_char_and_inc(cc);
    c = const_cast<char*>(cc);
    return r;
}

Any optimizing compiler worth the title will collapse it down to nothing.

CodePudding user response:

You could go functional and return a tuple, e.g. (demonstrating std::get and structured binding):

#include <iostream>
#include <tuple>
#include <string.h>

std::tuple<int, char const*> get_char_and_inc(char const* c) {
  int x = static_cast<int>(*c);
  c  ;
  return {x, c};
}

int main() {
  char const* cc = "ab";
  auto v1 = get_char_and_inc(cc);
  std::cout << std::get<0>(v1) << ", " <<
               std::get<1>(v1) << "\n";

  char* c = strdup("ab");
  auto [val2, next_c2] = get_char_and_inc(c);
  std::cout << val2 << ", " <<
               next_c2 << "\n";
  free (c);
  return 0;
}

See demo: https://godbolt.org/z/9EWf5zWaj - from there you can see that with -Os the object code is pretty compact (the only real bloat is for std::cout)

CodePudding user response:

The problem is that you are passing the pointer to the string by reference. You can do it this way but as you found out then you can't mix const char* and char*. You can create a const char* call it pCursor and pass that in instead. I would recommend writing your function like below. This way you pass a reference to the value and you return a const char* pointer to the next character. I would also recommend not incrementing the pointer directly and instead using an index value.

const char* get_char_and_inc(const char* pStr, uint16_t& value)
{
    int currentIndex = 0;

    value = pStr[currentIndex  ];

    if ((value & 0xC0) == 0xC0)
    {
        while ((pStr[currentIndex] & 0xC0) == 0x80)
        {
            value = (value << 8) | pStr[currentIndex  ];
        }
    }

    return &pStr[currentIndex];
}

Then your main becomes.

int main()
{
    const char cc[] = "ab";

    uint16_t value;

    const char* pCursor = get_char_and_inc(cc, value);

    printf(pCursor);

    char c[] = "ab";

    pCursor = get_char_and_inc(c, value);

    printf(pCursor);
}

If your don't want to change your get_char_and_inc function then you can change your main to this:

int main()
{
    const char cc[] = "ab";

    const char* pCursor = cc;

    get_char_and_inc(pCursor);
    printf(pCursor);

    char c[] = "ab";

    pCursor = c;

    get_char_and_inc(pCursor); // This does not compile
    printf(pCursor);
}
  • Related