Home > Blockchain >  sizeof for a string in array of strings
sizeof for a string in array of strings

Time:02-27

I'm trying to switch from python to c for sometime, I was just checking out a few functions, what caught my attention is sizeof operator which returns the size of object in bytes. I created an array of strings, and would want to find the size of the array. I know that it can be done by sizeof(array)/sizeof(array[0]). However, I find this a bit confusing.

I expect that large array would be 2D (which is just 1D array represented differently) and each character array within this large array would occupy as many bytes as the maximum size of character array within this large array. Example below

#include <stdio.h>
#include <string.h>

const char *words[] = {"this","that","Indian","he","she","sometimes","watch","now","browser","whatsapp","google","telegram","cp","python","cpp","vim","emacs","jupyter","space","earphones","laptop","charger","whiteboard","chalk","marker","matrix","theory","optimization","gradient","descent","numpy","sklearn","pandas","torch","array"};

const int length = sizeof(words)/sizeof(words[0]);

int main()
{
        printf("%s",words[1]);
        printf("%i",length);
        printf("\n%lu",sizeof(words[0]));
        printf("\n%lu %lu %s",sizeof(words[27]),strlen(words[27]),words[27]);
        return 0;
}

[OUT]
that35
8
8 12 optimization

each of the character arrays occupy 8 bytes, including the character array "optimization". I don't understand what is going on here, the strlen function gives expected output since it just find NULL character in the character array, I'd expected the output of sizeof operator to be 1 more than the output of strlen.

PS: I didn't find some resource that addresses this issue.

CodePudding user response:

It's happening because sizeof(words[27]) is giving the size of a pointer and words[27] is a pointer, and pointers have a fixed size of each machine, mostly 8 bytes on a x86_64 architecture CPU. Also, words is an array of pointers.

each of the character arrays occupy 8 bytes, including the character array "optimization".

No, each word in words is occupying a fixed memory (their length), 8 bytes is the size of pointer which is unsigned long int, it stores the address of the word in words.

const int length = sizeof(words)/sizeof(words[0]);

The above line gives 35 because words is not decayed as a pointer, it is stored in the program's data section, because it's a global variable.

Read More about pointer decaying:

  1. Illustration.

    In practice, the words will probably point to multiple entries from read-only-data. To use words in this manner, it is totally appropriate to use strlen.

  • Related