So I was doing an exercise to see if I was using memset correctly.
Here's the original code I wrote which was supposed to memset some addressese to have value 50:
int main(){
int *block1 = malloc(2048);
memset(block1, 50, 10);
// int count = 0;
for (int *iter = block1; (uint8_t *) iter < (uint8_t *)block1 10; iter = (int *) ((uint8_t *)iter 1) ){
printf("%p : %d\n", iter, *iter);
}
return 0;
}
I expected every address in memory to store the value 50. HOWEVER my output was:
(Address : Value)
0x14e008800 : 842150450
0x14e008801 : 842150450
0x14e008802 : 842150450
0x14e008803 : 842150450
0x14e008804 : 842150450
0x14e008805 : 842150450
0x14e008806 : 842150450
0x14e008807 : 3289650
0x14e008808 : 12850
0x14e008809 : 50
I was stuck on the problem for a while and tried a bunch of things until I randomly decided that maybe my pointer is the problem. I then tried a uint8_t pointer.
int main(){
uint8_t *block1 = malloc(2048);
memset(block1, 50, 10);
for (uint8_t *iter = block1; iter < block1 10; iter ){
printf("%p : %d\n", iter, *iter);
}
return 0;
}
All I did was change the type of the block1 variable and my iter variable to be uint8_t pointers instead of int pointers and I got the correct result!
0x13d808800 : 50
0x13d808801 : 50
0x13d808802 : 50
0x13d808803 : 50
0x13d808804 : 50
0x13d808805 : 50
0x13d808806 : 50
0x13d808807 : 50
0x13d808808 : 50
0x13d808809 : 50
My question is then, why did that make such a difference?
CodePudding user response:
My question is then, why did that make such a difference?
Because the exact type of a pointer is hugely important. Pointers in C are not just memory addresses. Pointers are memory addresses, along with a notion of what type of data is expected to be found at that address.
If you write
uint8_t *p;
... p = somewhere ...
printf("%d\n", *p);
then in that last line, *p
fetches one byte of memory pointed to by p
.
But if you write
int *p;
... p = somewhere ...
printf("%d\n", *p);
where, yes, the only change is the type of the pointer, then in that exact same last line, *p
now fetches four bytes of memory pointed to by p
, interpreting them as a 32-bit int
. (This assumes int
on your machine is four bytes, which is pretty common these days.)
When you called
memset(block1, 50, 10);
you were asking for some (though not all) of the individual bytes of memory in block1
to be set to 50.
When you used an int
pointer to step over that block of memory, fetching (as we said earlier) four bytes of memory at a time, you got 4-byte integers where each of the 4 bytes contained the value 50. So the value you got was
(((((50 << 8) | 50) << 8) | 50) << 8) | 50
which just happens to be exactly 842150450.
Or, looking at it another way, if you take that value 842150450 and convert it to hex (base 16), you'll find that it's 0x32323232, where 0x32 is the hexadecimal value of 50, again showing that we have four bytes each with the value 50.
Now, that all makes sense so far, although, you were skating on thin ice in your first program. You had int *iter
, but then you said
for(iter = block1; (uint8_t *) iter < (uint8_t *)block1 10; iter = (int *) ((uint8_t *)iter 1) )
In that cumbersome increment expression
iter = (int *) ((uint8_t *)iter 1)
you have contrived to increment the address in iter
by just one byte. Normally, we say
iter = iter 1
or just
iter
and this means to increment the address in iter
by several bytes, so that it points at the next int
in a conventional array of int
.
Doing it the way you did had three implications:
- You were accessing a sort of sliding window of
int
-sized subblocks ofblock1
. That is, you fetched anint
made from bytes 1, 2, 3, and 4, then anint
made from bytes 2, 3, 4, and 5, then anint
made from bytes 3, 4, 5, and 6, etc. Since all the bytes had the same value, you always got the same value, but this is a strange and generally meaningless thing to do. - Three out of four of the
int
values you fetched were unaligned. It looks like your processor let you get away with this, but some processors would have given you a Bus Error or some other kind of memory-access exception, because unaligned access aren't always allowed. - You also violated the rule about strict aliasing.
CodePudding user response:
The function memset
sets each byte of the supplied memory with the specified value.
So in this call
memset(block1, 50, 10);
10
bytes of the memory addressed by the pointer block1
were set with the value 50
.
But using the pointer iter
that has the type int *
you are outputting at once sizeof( int )
bytes pointed to by the pointer.
On the other hand if to declare the pointer as having the type
uint8_t *iter;
then you will output only one byte of memory.
Consider the following demonstration program.
#include <stdio.h>
int main( void )
{
int x;
memset( &x, 50, sizeof( x ) );
printf( "x = %d\n", x );
for ( const char *p = ( const char * )&x; p != ( const char * )&x sizeof( x ); p )
{
printf( "%d", *p );
}
putchar( '\n' );
}
The program output is
x = 842150450
50505050
That is each byte of the memory occupied by the integer variable x
was set equal to 50.
If to output each byte separately then the program outputs the values 50.
To make it even more clear consider one more demonstration program.
#include <stdio.h>
int main( void )
{
printf( "50 in hex is %#x\n", 50 );
int x = 0x32323232;
printf( "x = %d\n", x );
}
The program output is
50 in hex is 0x32
x = 842150450
That is the value 50
in hexadecimal is equal tp 0x32
.
Thus this initialization
int x = 0x32323232;
yields the same result as the call of the function memset
memset( &x, 50, sizeof( x ) );
that you could equivalently rewrite like
memset( &x, 0x32, sizeof( x ) );
CodePudding user response:
In the first case you are de-referencing the int*
iter
so it prints the (misaligned) int
value at the address, not the byte value.
It is clear what is happening when you look at the value 842150450 in hexadecimal - 0x32323232 - that is each byte of the integer is 0x32 (50 decimal). The bytes after the tenth byte are undefined, but happen to be zero in this case and the target little endian, so it tails off with 0x323233, 0x3232, and finally 0x32.
Clearly the second case is the more "correct" solution, but you can fix the first case thus;
printf("%p : %d\n",
(void*)iter,
*(uint8_t*)iter);