I am little confused about the working technique of memchr()
in C
. I have observed different implementation of memchr()
and found that first it takes and converts the character or number to unsigned char
type and then searches a array byte by byte.
I have two questions:
1.If it converts the anything to unsigned char
then how it compares a number which has size bigger than unsigned char
, for example int
type.
2.If it compares byte by byte and returns then returns the address of the first occurence of the character, then suppose I want to search 0x8
in a array.
#include <stdio.h>
#include <string.h>
int main(void)
{
const int arr[5] = {0x1021, 0x8988, 0x706, 0x50, 0x22};
int * ptr;
ptr = memchr(arr, 0x89, sizeof(arr));
printf("arr:%p ptr:%p\n", arr, ptr);
return 0;
}
It should return the address of the 3rd byte of the array as the 0x89
matches with first byte of the element 0x8988
of the array which is at 7th byte as memchr()
matches byte by byte (unsigned char)
not int
type.
Assuming: int
is 4 bytes and unsigned char
is 1 byte.
CodePudding user response:
It doesn't.
memchr
can only search for values that are representable byunsigned char
.If you supply an array of
int
s as the first argument tomemchr
, then it will look for theunsigned char
value you supplied as the second argument, within the object representation of eachint
.The object representation of a
uint32_t
is the correspondence between the two members ofunion { uint32_t v; uint8_t r[4]; } u
; it is implementation-defined but will almost always be one of these two:- big-endian:
v === r[0] << 24 | r[1] << 16 | r[2] << 8 | r[3]
- little-endian:
v === r[3] << 24 | r[2] << 16 | r[1] << 8 | r[0]
The object representation of an
int32_t
is the above plus the rule for how negative numbers are represented. The object representation ofint
is those two things plus the actual size ofint
.- big-endian:
The example code you showed,
const int arr[5] = {0x1021, 0x8988, 0x706, 0x50, 0x22};
int *ptr = memchr(arr, 0x89, sizeof(arr));
is, assuming int
and int32_t
are the same type, and assuming little-endian object representations, and we don't care how negative numbers are represented because all the numbers here are positive, equivalent to
const uint8_t arr[20] = {
0x21, 0x10, 0x00, 0x00,
0x88, 0x89, 0x00, 0x00,
0x06, 0x07, 0x00, 0x00,
0x50, 0x00, 0x00, 0x00,
0x22, 0x00, 0x00, 0x00,
};
int *ptr = memchr(arr, 0x89, sizeof(arr));
You can see that the value being searched for, 0x89, occurs at offset 5 from the start of this second array.
(Note: the type of ptr
should be uint8_t *
, not int *
. The pointer returned by memchr
is not necessarily a valid pointer to int
(nor anything else besides unsigned char
).)
CodePudding user response:
Instead of asking this question you could easily find the answer yourself just by adding few lines of code. You need to do it yourself if you really want to learn programing.
int main(void)
{
const int arr[5] = {0x1021, 0x8988, 0x706, 0x50, 0x22};
int * ptr;
unsigned char *ucp = arr;
for(size_t i = 0; i < sizeof(arr); i )
{
printf("Byte no: zu = 0xhhx, %s\n", i, ucp[i], ucp[i] == 0x89 ? " <<<----" : "");
}
ptr = memchr(arr, 0x89, sizeof(arr));
printf("arr:%p ptr:%p diff = %zd\n", (void *)arr, (void *)ptr, (ptrdiff_t)ptr - (ptrdiff_t)arr);
}
and the output:
Byte no: 00 = 0x21,
Byte no: 01 = 0x10,
Byte no: 02 = 0x00,
Byte no: 03 = 0x00,
Byte no: 04 = 0x88,
Byte no: 05 = 0x89, <<<----
Byte no: 06 = 0x00,
Byte no: 07 = 0x00,
Byte no: 08 = 0x06,
Byte no: 09 = 0x07,
Byte no: 10 = 0x00,
Byte no: 11 = 0x00,
Byte no: 12 = 0x50,
Byte no: 13 = 0x00,
Byte no: 14 = 0x00,
Byte no: 15 = 0x00,
Byte no: 16 = 0x22,
Byte no: 17 = 0x00,
Byte no: 18 = 0x00,
Byte no: 19 = 0x00,
arr:0x7fffc2747900 ptr:0x7fffc2747905 diff = 5
I think that the answer is quite obvious now.