Question:
Can I have two pointers of different types (uint32_t *
and char *
) pointing to the very same address?
Here is why I want to have this:
I want to convert UTF-8 to UTF-32 and vice versa in C
.
Lets say, I have a variable of type uint32_t
that contains one UTF-32 encoded unicode character. And I already know that it needs 4 byte when encoded in UTF-8. It's binary representation is this:
00000000000aaabbbbbbccccccdddddd
a, b, c and d are 4 different ranges where each bit can be 0 or 1.
With clever bitwise &
, |
and <<
operations I can rearrange these bits so that at the end there is this new distribution:
00000aaa00bbbbbb00cccccc00dddddd
And then I can flip some bits (using |
again), to get this
11110aaa10bbbbbb10cccccc10dddddd
When I split this into 4 subsequent char
variables in an array I have this:
11110aaa 10bbbbbb 10cccccc 10dddddd
which is exactly the UTF-8 encoding of the same unicode character.
So, the very same 4 byte in memory shall be one single uint32_t
variable and at the same time an array of 4 char
variables:
So, I want to have this:
uint32_t *utf32;
char utf8[4];
*utf32
is a pointer that points to a single 4 bytes longuint32_t
variable.utf8
is a pointer to an array of 4char
elements, each 1 byte long.
And I want that both pointers point to the very same address. So I can write a utf32 encoded character into the variable utf32
, transform it in place, and then read the result form the array utf32
. Is this possible? If so: How can I do it?
(I used this technique very often when I was coding in COBOL in the previous millennium, because in COBOL it's easy to overload the same region in the memory with many different definitions. But I don't know how to do it in C.)
I have found a lot of questions dealing with 2 pointers pointing to the same address, but in these questions the pointers have always the same type. And some other questions are about why you get an error if a pointer defined with a certain type points to an address that was defined with another type. But I didn't find anything about two pointers of different types sharing the same address.
CodePudding user response:
Can I have two pointers of different types (uint32_t * and char *) pointing to the very same address?
Yes, you can.
union U {
uint32_t ui32;
char c[4];
};
union U u;
u.ui32 = ...
uint32_t *pi = &u.ui32;
char *cp = u.c;
assert(pi == cp);
There are some C language rules which you'll violate IF you use the resulting char*
to do something other than copying the data in or out, but the "two diffierent pointer types pointing to the same address" is not a problem in itself.
CodePudding user response:
Yes, two pointers of different types can point to the same address.
Let's say that somewhere in your memory is this utf32 and you know where that is so I will refer to this as address
.
So if you'd want to treat these 4 bytes like a uint32
you could do this:
uint32_t* utf32 = address;
And you can just as easily treat is as a char array:
char* utf8 = address;
If you then want to access a char you just do:
utf8[index]