How memchr in C actually works?-CodePudding

I am little confused about the working technique of memchr() in C . I have observed different implementation of memchr() and found that first it takes and converts the character or number to unsigned char type and then searches a array byte by byte.

I have two questions:

1.If it converts the anything to unsigned char then how it compares a number which has size bigger than unsigned char, for example int type. 2.If it compares byte by byte and returns then returns the address of the first occurence of the character, then suppose I want to search 0x8 in a array.

#include <stdio.h>
#include <string.h>

int main(void)
{
  const int arr[5] = {0x1021, 0x8988, 0x706, 0x50, 0x22};
  int * ptr;

  ptr = memchr(arr, 0x89, sizeof(arr));
  printf("arr:%p ptr:%p\n", arr, ptr);

  return 0;
}

It should return the address of the 3rd byte of the array as the 0x89 matches with first byte of the element 0x8988 of the array which is at 7th byte as memchr() matches byte by byte (unsigned char) not int type.

Assuming: int is 4 bytes and unsigned char is 1 byte.

CodePudding user response：

It doesn't. memchr can only search for values that are representable by unsigned char.
If you supply an array of ints as the first argument to memchr, then it will look for the unsigned char value you supplied as the second argument, within the object representation of each int.

The object representation of a uint32_t is the correspondence between the two members of union { uint32_t v; uint8_t r[4]; } u; it is implementation-defined but will almost always be one of these two:
- big-endian: v === r[0] << 24 | r[1] << 16 | r[2] << 8 | r[3]
- little-endian: v === r[3] << 24 | r[2] << 16 | r[1] << 8 | r[0]
The object representation of an int32_t is the above plus the rule for how negative numbers are represented. The object representation of int is those two things plus the actual size of int.

The example code you showed,

const int arr[5] = {0x1021, 0x8988, 0x706, 0x50, 0x22};
int *ptr = memchr(arr, 0x89, sizeof(arr));

is, assuming int and int32_t are the same type, and assuming little-endian object representations, and we don't care how negative numbers are represented because all the numbers here are positive, equivalent to

const uint8_t arr[20] = {
  0x21, 0x10, 0x00, 0x00,
  0x88, 0x89, 0x00, 0x00,
  0x06, 0x07, 0x00, 0x00,
  0x50, 0x00, 0x00, 0x00,
  0x22, 0x00, 0x00, 0x00,
};
int *ptr = memchr(arr, 0x89, sizeof(arr));

You can see that the value being searched for, 0x89, occurs at offset 5 from the start of this second array.

(Note: the type of ptr should be uint8_t *, not int *. The pointer returned by memchr is not necessarily a valid pointer to int (nor anything else besides unsigned char).)

CodePudding user response：

Instead of asking this question you could easily find the answer yourself just by adding few lines of code. You need to do it yourself if you really want to learn programing.

int main(void)
{
  const int arr[5] = {0x1021, 0x8988, 0x706, 0x50, 0x22};
  int * ptr;
  unsigned char *ucp = arr;

  for(size_t i = 0; i < sizeof(arr); i  )
  {
      printf("Byte no: zu = 0xhhx, %s\n", i, ucp[i], ucp[i] == 0x89 ? " <<<----" : "");
  }
  
  ptr = memchr(arr, 0x89, sizeof(arr));
  printf("arr:%p ptr:%p diff = %zd\n", (void *)arr, (void *)ptr, (ptrdiff_t)ptr - (ptrdiff_t)arr);
}

and the output:

Byte no: 00 = 0x21, 
Byte no: 01 = 0x10, 
Byte no: 02 = 0x00, 
Byte no: 03 = 0x00, 
Byte no: 04 = 0x88, 
Byte no: 05 = 0x89,  <<<----
Byte no: 06 = 0x00, 
Byte no: 07 = 0x00, 
Byte no: 08 = 0x06, 
Byte no: 09 = 0x07, 
Byte no: 10 = 0x00, 
Byte no: 11 = 0x00, 
Byte no: 12 = 0x50, 
Byte no: 13 = 0x00, 
Byte no: 14 = 0x00, 
Byte no: 15 = 0x00, 
Byte no: 16 = 0x22, 
Byte no: 17 = 0x00, 
Byte no: 18 = 0x00, 
Byte no: 19 = 0x00, 
arr:0x7fffc2747900 ptr:0x7fffc2747905 diff = 5

I think that the answer is quite obvious now.