Home > database >  How is the sizeof operator ACTUALLY evaluated
How is the sizeof operator ACTUALLY evaluated

Time:02-03

My project demands a complete understanding of how the sizeof operator actually works. The C standard specification in this regard is vague and it will be dangerous to rely on my interpretations of it. I am particularly interested in when and how the sizeof ought to be processed.

  1. My previous knowledge suggested that it is a compile-time operator, which I never questioned, because I never abused sizeof too much. However:
int size = 0;
scanf("%i", &size);
printf("%i\n", sizeof(int[size]));

This for instance cannot be evaluated at compile time by any meaning.

char c = '\0';
char*p = &c;
printf("%i\n", sizeof(*p));

I do not remember the exact code that produces U/B, but here, *p is an actual expression (RTL unary dereference). By presumption, does it mean that sizeof(c c) is a way to force compile-time evaluation by means of the expression or will it be optimized by the compiler?

  1. Does sizeof return a value of type int, is it a size_t (ULL on my platform), or is it implementation-defined.

  2. This article states that "The operand to sizeof cannot be a type-cast", which is incorrect. Type-casting has the same precedence as the sizeof operator, meaning in a situation where both are used, they are simply evaluated right to left. sizeof(int) * p probably does not work, because if the operand is a type in braces, this is handled first, but sizeof((int)*p) works just fine.

I am asking for a little technical elaboration on how sizeof is implemented. That can be of use to anyone who doesn't want to spread misinformation, inaccuracies or as in my case - work on a project that is directly dependent on it.

CodePudding user response:

1. My previous knowledge suggested that it is a compile-time operator, which I never questioned, because I never abused sizeof too much…

C 2018 6.5.3.4 2 specifies the behavior of sizeof and says:

… If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.

In your example with sizeof(int[size]), the type of int[size] is a variable length array type, so the operand is evaluated1, effectively computing the size during program execution.

In your example with sizeof(*p), the type of *p is not a variable length array type, so the operand is not evaluated. The fact that p may point to an object of automatic storage duration that is created during program execution is irrelevant; the type of *p is known during compilation, so *p is not evaluated, and the result of sizeof is an integer constant.

2. Does sizeof return a value of type int, is it a size_t (ULL on my platform), or is it implementation-defined.

C 2018 6.5.3.4 5 says “The value of the result of both operators [sizeof and _Alignof] is implementation-defined, and its type (an unsigned integer type) is size_t, defined in <stddef.h> (and other headers).”

3. This article states that "The operand to sizeof cannot be a type-cast", which is incorrect. Type-casting has the same precedence as the sizeof operator, meaning in a situation where both are used, they are simply evaluated right to left. sizeof(int) * p probably does not work, because if the operand is a type in braces, this is handled first, but sizeof((int)*p) works just fine.

The article means the operand cannot directly be a cast-expression (C 2018 6.5.4) in the form ( type-name ) cast-expression, due to how the formal grammar of C is structured. Formally, an expression operand to sizeof is a unary-expression (6.5.3) in the grammar, and a unary-expression can, through a chain of grammar productions, be a cast-expression inside parentheses.

Footnote

1 We often think of a type-name (a specification of a type, such as int [size]) as more of a passive declaration than an executable statement or expression, but C 2018 6.8 4 tells us “There is also an implicit full expression in which the non-constant size expressions for a variably modified type are evaluated…”

CodePudding user response:

The semantics of sizeof() per the (draft) C11 standard:

The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.

Note "If the type of the operand is a variable length array type, the operand is evaluated". The means that the size of a VLA is computed at run time.

"otherwise, the operand is not evaluated and the result is an integer constant" means the result is evaluated at compile time.

The return type is size_t. Full stop:

The value of the result of both operators (sizeof() and _Alignof()) is implementation-defined, and its type (an unsigned integer type) is size_t, defined in <stddef.h> (and other headers).

Note that the type is size_t. Don't use unsigned long nor unsigned long long nor anything else. Always use size_t.

CodePudding user response:

You're overthinking things a bit.

Yes, when the operand of sizeof is a variable-length array expression, then that has to be evaluated at run time - otherwise, it's a compile-time operation and the operand is not evaluated.

printf("%i\n", sizeof(*p));

I do not remember the exact code that produces U/B, but here, *p is an actual expression (RTL unary dereference).

Doesn't matter - the expression *p is not evaluated as part of the sizeof operation. All that matters is the type of *p, which is known at translation. This is a perfectly valid idiom for dynamic memory allocation:

size_t size = some_value();
int *p = malloc( sizeof *p * size );

By presumption, does it mean that sizeof(c c) is a way to force compile-time evaluation by means of the expression or will it be optimized by the compiler?

Again, the expression c c won't be evaluated - all that matters is the type.

Does sizeof return a value of type int, is it a size_t (ULL on my platform), or is it implementation-defined.

size_t. That's stated explicitly in the language definition:

6.5.3.4 The sizeof and _Alignof operators
...
5 The value of the result of both operators is implementation-defined, and its type (an unsigned integer type) is size_t, defined in <stddef.h> (and other headers).
C 2011 Online Draft

This article states that "The operand to sizeof cannot be a type-cast", which is incorrect. Type-casting has the same precedence as the sizeof operator, meaning in a situation where both are used, they are simply evaluated right to left. sizeof(int) * p probably does not work, because if the operand is a type in braces, this is handled first, but sizeof((int)*p) works just fine.

What that article is saying is that an operand that's a cast-expression won't be parsed correctly. The syntax for sizeof is

unary-expression:
    ...
    sizeof unary-expression
    sizeof ( type-name )

and the syntax for a cast-expression is

cast-expression:
    unary-expression
    ( type-name ) cast-expression

If you write an expression like

sizeof (int) *p;

it won't be parsed as

sizeof ((int) *p);

Instead, it will be parsed as

(sizeof (int)) *p;

and interpreted as a multiplicative-expression:

multiplicative-expression * cast-expression

IOW, the compiler will think you're trying to multiply the result of sizeof (int) to the value of p (which should result in a diagnostic). If you wrap the cast-expression in parentheses, then it's parsed correctly.

Type-casting has the same precedence as the sizeof operator

That is not correct. Unary expressions (including sizeof expressions) have higher precedence than cast expressions. That's why sizeof (int) *p is parsed as (sizeof (int)) *p.

CodePudding user response:

Here's an attempt to provide a complete guide to the sizeof operator and its many quirks. Warning: this post may contain heavy "language-lawyering".


Formal syntax and valid forms

sizeof is a keyword in C and the syntax is defined in C17 6.5.3 as:

sizeof unary-expression
sizeof ( type-name )

Meaning that there are two possible ways to use it: sizeof op or sizeof(op). In the former case, the operand has to be an expression (for example sizeof my_variable) and in the latter case it has to be a type (for example sizeof(int)).

When we use sizeof, we almost always use a parenthesis. Always using parenthesis is considered good practice (and Linus Torvalds famously once had one of his usual childish tantrums about it). But which form of sizeof we use depends on if we pass an expression or a type. So even when we use paranthesis around an expression, we actually don't use the second version then, but the former. Example:

int x;
printf("%zu\n", sizeof(x));

In this case we are passing an expression to sizeof. The expression is (x) and the parenthesis is a regular ("primary expression") parenthesis that we may use around any expression in C - it does not belong to the sizeof operator in this case.


"The operand to sizeof cannot be a type-cast" - precedence and associativity or...?

Following the above explanation, whenever we write sizeof (int) * p, this gets interpreted as the second form with a type name. Why?

Why isn't very obvious at all, this is in fact dang subtle. It is easy to get tricked by "operator precedence tables" like the one you link. It states that the cast operator like sizeof is a unary operator with right-to-left associativity. But this isn't actually true when digging through the dirty details of C grammar.

There is actually no such thing as a precedence table in the C standard, nor does it define associativity explicitly. Instead operator precedence is decided (as complicated as humanly possible) by a long chain of syntax definitions in chapter 6.5. In each sub chapter, the operator group refers to the previous and sometimes next operator group in the formal syntax, thereby stating that the current group has lower precedence than the previous. For 6.5.3 unary operators, it goes like:

unary-expression:

postfix-expression
unary-expression
-- unary-expression
unary-operator cast-expression
sizeof unary-expression
sizeof ( type-name )
_Alignof ( type-name )

unary-operator: one of
& * - ˜ !

Translated from standardese to English, this grammar goo is to be read roughly as:

"Here is the group of unary expressions. They are the prefix and -- operators, or one of the unary operators (listed separately), or sizeof in the two different forms, or _Alignof. They may follow a postfix expression, meaning that any postfix expression (or operator groups even higher up the syntax chain) has higher precedence then the unary operators. They may be followed by a cast expression, which thereby has lower precedence than the unary operators."

So depending on how you put it, there's actually a subtle error in the link or maybe they could have explained this better (I'm not sure if I even just managed myself, so I don't blame them really). Outside the formal C standard, the concept of "right-to-left associativity" doesn't work unless the cast operator is listed as part of the unary operators in that table even though it actually has lower precedence in the grammar.

So anyway, the sizeof (type-name) operator is a unary expression and takes precedence in the grammar above the cast operator. And that's why the compiler will not treat this as the two operators sizeof and (cast), but as the operator sizeof(type) followed by the binary multiplication operator.

And so sizeof (int) * p turns into equivalent of (sizeof(int)) * p, sizeof with binary multiplication, which is probably nonsense and perhaps the actual intent here was to dereference a pointer p, cast and then take the size.

We could however write something like sizeof ((int)*p)) and then the parsing order is: parenthesis, then (because of unary operator right-to-left associativity) de-reference, then cast, then sizeof.


What is the type returned by sizeof?

It returns a special large, unsigned integer type size_t (C17 6.5.3.4/5) generally regarded as "large enough" to hold the largest object allowed in the system. The type is commonly used whenever we wish to take the size of something, like when iterating through an array.

For example you might see some code on SO in the form for(size_t i=0; i<n; i ) when iterating through an array, since this is the most correct type "large enough" to contain the size of an array. (int might be too small and besides it is signed too and we can't have negative sizes.)

size_t is found in stddef.h, which in turn is included by a lot of other standard headers like stdio.h. It can hold values up to SIZE_MAX defined in stdint.h.

size_t is printed with printf by using the %zu conversion specifier, hence my previous example printf("%zu\n", sizeof(x));.


Compile-time or run-time?

sizeof is normally a compile-time operator meaning that the operand does not get evaluated. With one exception and that is variable-length arrays (VLA), where the size is simply not known at compile-time.

C17 6.5.3.4/2:

The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.

Most of the time this doesn't matter. However, we can cook up some artificial example like this:

#include <stdio.h>

int main (void)
{
  int size;
  scanf("%d",&size); // enter 2
  int arr[5][size];

  printf("%zu ", sizeof(size  )); // size   not executed
  printf("%d ", size); // print 2

  printf("%zu ", sizeof(arr[size  ])); // size   is executed
  printf("%d ", size);
}

When I try this out and enter 2, it prints 4 2 8 3:

  • 4 because that's the size of an int on this system.
  • 2 because the operand size was not executed/evaluated.
  • 8 because the 2 * sizeof(int) is 8.
  • 3 because the operand arr[size ] was executed/evaluated, since arr[n] results in a VLA operand.

This behavior of which operand that gets evaluated or not is well-defined and guaranteed.

Hence a popular trick int* ptr = malloc(n * sizeof *ptr);. In case *ptr would get evaluated, it's an uninitialized pointer that we definitely can't dereference and it would have been undefined behavior. But since it is guaranteed not to get evaluated, the trick is safe.


An exception to "array decay"

sizeof is one of the few operands that is an exception to the rule of "array decay":

C17 6.3.2.1/3

Except when it is the operand of the sizeof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue.


sizeof is used in C's definition of a byte

The size of a byte in C is defined as per C17 3.6

3.6
byte
addressable unit of data storage large enough to hold any member of the basic character set of the execution environment

and then 6.5.3.4/4:

When sizeof is applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.

For this reason it doesn't make much sense to write things like malloc(n * sizeof(char) because sizeof(char) is by definition guaranteed to always be 1.

(The number of bits in a char is however not guaranteed to be 8.)

  • Related