My project demands a complete understanding of how the sizeof
operator actually works. The C standard specification in this regard is vague and it will be dangerous to rely on my interpretations of it. I am particularly interested in when and how the sizeof
ought to be processed.
- My previous knowledge suggested that it is a compile-time operator, which I never questioned, because I never abused
sizeof
too much. However:
int size = 0;
scanf("%i", &size);
printf("%i\n", sizeof(int[size]));
This for instance cannot be evaluated at compile time by any meaning.
char c = '\0';
char*p = &c;
printf("%i\n", sizeof(*p));
I do not remember the exact code that produces U/B, but here, *p
is an actual expression (RTL unary dereference). By presumption, does it mean that sizeof(c c)
is a way to force compile-time evaluation by means of the expression or will it be optimized by the compiler?
Does
sizeof
return a value of typeint
, is it asize_t
(ULL on my platform), or is it implementation-defined.This article states that "The operand to
sizeof
cannot be a type-cast", which is incorrect. Type-casting has the same precedence as thesizeof
operator, meaning in a situation where both are used, they are simply evaluated right to left.sizeof(int) * p
probably does not work, because if the operand is a type in braces, this is handled first, butsizeof((int)*p)
works just fine.
I am asking for a little technical elaboration on how sizeof
is implemented. That can be of use to anyone who doesn't want to spread misinformation, inaccuracies or as in my case - work on a project that is directly dependent on it.
CodePudding user response:
1. My previous knowledge suggested that it is a compile-time operator, which I never questioned, because I never abused sizeof
too much…
C 2018 6.5.3.4 2 specifies the behavior of sizeof
and says:
… If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.
In your example with sizeof(int[size])
, the type of int[size]
is a variable length array type, so the operand is evaluated1, effectively computing the size during program execution.
In your example with sizeof(*p)
, the type of *p
is not a variable length array type, so the operand is not evaluated. The fact that p
may point to an object of automatic storage duration that is created during program execution is irrelevant; the type of *p
is known during compilation, so *p
is not evaluated, and the result of sizeof
is an integer constant.
2. Does sizeof
return a value of type int
, is it a size_t
(ULL on my platform), or is it implementation-defined.
C 2018 6.5.3.4 5 says “The value of the result of both operators [sizeof
and _Alignof
] is implementation-defined, and its type (an unsigned integer type) is size_t
, defined in <stddef.h> (and other headers).”
3. This article states that "The operand to sizeof
cannot be a type-cast", which is incorrect. Type-casting has the same precedence as the sizeof
operator, meaning in a situation where both are used, they are simply evaluated right to left.
sizeof(int) * p
probably does not work, because if the operand is a type in braces, this is handled first, but sizeof((int)*p)
works just fine.
The article means the operand cannot directly be a cast-expression (C 2018 6.5.4) in the form ( type-name ) cast-expression
, due to how the formal grammar of C is structured. Formally, an expression operand to sizeof
is a unary-expression (6.5.3) in the grammar, and a unary-expression can, through a chain of grammar productions, be a cast-expression inside parentheses.
Footnote
1 We often think of a type-name (a specification of a type, such as int [size]
) as more of a passive declaration than an executable statement or expression, but C 2018 6.8 4 tells us “There is also an implicit full expression in which the non-constant size expressions for a variably modified type are evaluated…”
CodePudding user response:
The semantics of sizeof()
per the (draft) C11 standard:
The
sizeof
operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.
Note "If the type of the operand is a variable length array type, the operand is evaluated". The means that the size of a VLA is computed at run time.
"otherwise, the operand is not evaluated and the result is an integer constant" means the result is evaluated at compile time.
The return type is size_t
. Full stop:
The value of the result of both operators (
sizeof()
and_Alignof()
) is implementation-defined, and its type (an unsigned integer type) issize_t
, defined in <stddef.h> (and other headers).
Note that the type is size_t
. Don't use unsigned long
nor unsigned long long
nor anything else. Always use size_t
.
CodePudding user response:
You're overthinking things a bit.
Yes, when the operand of sizeof
is a variable-length array expression, then that has to be evaluated at run time - otherwise, it's a compile-time operation and the operand is not evaluated.
printf("%i\n", sizeof(*p));
I do not remember the exact code that produces U/B, but here,
*p
is an actual expression (RTL unary dereference).
Doesn't matter - the expression *p
is not evaluated as part of the sizeof
operation. All that matters is the type of *p
, which is known at translation. This is a perfectly valid idiom for dynamic memory allocation:
size_t size = some_value();
int *p = malloc( sizeof *p * size );
By presumption, does it mean that
sizeof(c c)
is a way to force compile-time evaluation by means of the expression or will it be optimized by the compiler?
Again, the expression c c
won't be evaluated - all that matters is the type.
Does sizeof return a value of type
int
, is it asize_t
(ULL on my platform), or is it implementation-defined.
size_t
. That's stated explicitly in the language definition:
6.5.3.4 TheC 2011 Online Draftsizeof
and_Alignof
operators
...
5 The value of the result of both operators is implementation-defined, and its type (an unsigned integer type) issize_t
, defined in<stddef.h>
(and other headers).
This article states that "The operand to
sizeof
cannot be a type-cast", which is incorrect. Type-casting has the same precedence as the sizeof operator, meaning in a situation where both are used, they are simply evaluated right to left.sizeof(int) * p
probably does not work, because if the operand is a type in braces, this is handled first, butsizeof((int)*p)
works just fine.
What that article is saying is that an operand that's a cast-expression won't be parsed correctly. The syntax for sizeof
is
unary-expression:
...
sizeof unary-expression
sizeof ( type-name )
and the syntax for a cast-expression is
cast-expression:
unary-expression
( type-name ) cast-expression
If you write an expression like
sizeof (int) *p;
it won't be parsed as
sizeof ((int) *p);
Instead, it will be parsed as
(sizeof (int)) *p;
and interpreted as a multiplicative-expression:
multiplicative-expression * cast-expression
IOW, the compiler will think you're trying to multiply the result of sizeof (int)
to the value of p
(which should result in a diagnostic). If you wrap the cast-expression in parentheses, then it's parsed correctly.
Type-casting has the same precedence as the
sizeof
operator
That is not correct. Unary expressions (including sizeof
expressions) have higher precedence than cast expressions. That's why sizeof (int) *p
is parsed as (sizeof (int)) *p
.
CodePudding user response:
Here's an attempt to provide a complete guide to the sizeof
operator and its many quirks. Warning: this post may contain heavy "language-lawyering".
Formal syntax and valid forms
sizeof
is a keyword in C and the syntax is defined in C17 6.5.3 as:
sizeof
unary-expression
sizeof
(
type-name)
Meaning that there are two possible ways to use it: sizeof op
or sizeof(op)
. In the former case, the operand has to be an expression (for example sizeof my_variable
) and in the latter case it has to be a type (for example sizeof(int)
).
When we use sizeof
, we almost always use a parenthesis. Always using parenthesis is considered good practice (and Linus Torvalds famously once had one of his usual childish tantrums about it). But which form of sizeof
we use depends on if we pass an expression or a type. So even when we use paranthesis around an expression, we actually don't use the second version then, but the former. Example:
int x;
printf("%zu\n", sizeof(x));
In this case we are passing an expression to sizeof
. The expression is (x)
and the parenthesis is a regular ("primary expression") parenthesis that we may use around any expression in C - it does not belong to the sizeof
operator in this case.
"The operand to sizeof cannot be a type-cast" - precedence and associativity or...?
Following the above explanation, whenever we write sizeof (int) * p
, this gets interpreted as the second form with a type name. Why?
Why isn't very obvious at all, this is in fact dang subtle. It is easy to get tricked by "operator precedence tables" like the one you link. It states that the cast operator like sizeof
is a unary operator with right-to-left associativity. But this isn't actually true when digging through the dirty details of C grammar.
There is actually no such thing as a precedence table in the C standard, nor does it define associativity explicitly. Instead operator precedence is decided (as complicated as humanly possible) by a long chain of syntax definitions in chapter 6.5. In each sub chapter, the operator group refers to the previous and sometimes next operator group in the formal syntax, thereby stating that the current group has lower precedence than the previous. For 6.5.3 unary operators, it goes like:
unary-expression:
postfix-expression
unary-expression
--
unary-expression
unary-operator cast-expression
sizeof
unary-expression
sizeof
(
type-name)
_Alignof
(
type-name)
unary-operator: one of
& * - ˜ !
Translated from standardese to English, this grammar goo is to be read roughly as:
"Here is the group of unary expressions. They are the prefix
and --
operators, or one of the unary operators (listed separately), or sizeof
in the two different forms, or _Alignof
. They may follow a postfix expression, meaning that any postfix expression (or operator groups even higher up the syntax chain) has higher precedence then the unary operators. They may be followed by a cast expression, which thereby has lower precedence than the unary operators."
So depending on how you put it, there's actually a subtle error in the link or maybe they could have explained this better (I'm not sure if I even just managed myself, so I don't blame them really). Outside the formal C standard, the concept of "right-to-left associativity" doesn't work unless the cast operator is listed as part of the unary operators in that table even though it actually has lower precedence in the grammar.
So anyway, the sizeof
(
type-name)
operator is a unary expression and takes precedence in the grammar above the cast operator. And that's why the compiler will not treat this as the two operators sizeof
and (cast)
, but as the operator sizeof(type)
followed by the binary multiplication operator.
And so sizeof (int) * p
turns into equivalent of (sizeof(int)) * p
, sizeof
with binary multiplication, which is probably nonsense and perhaps the actual intent here was to dereference a pointer p
, cast and then take the size.
We could however write something like sizeof ((int)*p))
and then the parsing order is: parenthesis, then (because of unary operator right-to-left associativity) de-reference, then cast, then sizeof.
What is the type returned by sizeof
?
It returns a special large, unsigned integer type size_t
(C17 6.5.3.4/5) generally regarded as "large enough" to hold the largest object allowed in the system. The type is commonly used whenever we wish to take the size of something, like when iterating through an array.
For example you might see some code on SO in the form for(size_t i=0; i<n; i )
when iterating through an array, since this is the most correct type "large enough" to contain the size of an array. (int
might be too small and besides it is signed too and we can't have negative sizes.)
size_t
is found in stddef.h
, which in turn is included by a lot of other standard headers like stdio.h
. It can hold values up to SIZE_MAX
defined in stdint.h
.
size_t
is printed with printf
by using the %zu
conversion specifier, hence my previous example printf("%zu\n", sizeof(x));
.
Compile-time or run-time?
sizeof
is normally a compile-time operator meaning that the operand does not get evaluated. With one exception and that is variable-length arrays (VLA), where the size is simply not known at compile-time.
C17 6.5.3.4/2:
The
sizeof
operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.
Most of the time this doesn't matter. However, we can cook up some artificial example like this:
#include <stdio.h>
int main (void)
{
int size;
scanf("%d",&size); // enter 2
int arr[5][size];
printf("%zu ", sizeof(size )); // size not executed
printf("%d ", size); // print 2
printf("%zu ", sizeof(arr[size ])); // size is executed
printf("%d ", size);
}
When I try this out and enter 2, it prints 4 2 8 3
:
- 4 because that's the size of an
int
on this system. - 2 because the operand
size
was not executed/evaluated. - 8 because the
2 * sizeof(int)
is 8. - 3 because the operand
arr[size ]
was executed/evaluated, sincearr[n]
results in a VLA operand.
This behavior of which operand that gets evaluated or not is well-defined and guaranteed.
Hence a popular trick int* ptr = malloc(n * sizeof *ptr);
. In case *ptr
would get evaluated, it's an uninitialized pointer that we definitely can't dereference and it would have been undefined behavior. But since it is guaranteed not to get evaluated, the trick is safe.
An exception to "array decay"
sizeof
is one of the few operands that is an exception to the rule of "array decay":
C17 6.3.2.1/3
Except when it is the operand of the
sizeof
operator, or the unary&
operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue.
sizeof
is used in C's definition of a byte
The size of a byte in C is defined as per C17 3.6
3.6
byte
addressable unit of data storage large enough to hold any member of the basic character set of the execution environment
and then 6.5.3.4/4:
When
sizeof
is applied to an operand that has typechar
,unsigned char
, orsigned char
, (or a qualified version thereof) the result is1
.
For this reason it doesn't make much sense to write things like malloc(n * sizeof(char)
because sizeof(char)
is by definition guaranteed to always be 1.
(The number of bits in a char
is however not guaranteed to be 8.)