Where do the values of uninitialized variables come from, in practice on real CPUs?-CodePudding

I want to know the way variables are initialized :

#include <stdio.h>
int main( void )
{
    int ghosts[3];
    for(int i =0 ; i < 3 ; i  )
    printf("%d\n",ghosts[i]);
    return 0;
}

this gets me random values like -12 2631 131 .. where did they come from?

For example with GCC on x86-64 Linux: https://godbolt.org/z/MooEE3ncc

I have a guess to answer my question, it could be wrong anyways:
The registers of the memory after they are 'emptied' get random voltages between 0 and 1, these values get 'rounded' to 0 or 1, and these random values depend on something?! Maybe the way registers are made? Maybe the capacity of the memory comes into play somehow? And maybe even the temperature?!!

CodePudding user response：

Your computer doesn't reboot or power cycle every time you run a new program. Every bit of storage in memory or registers your program can use has a value left there by some previous instruction, either in this program or in the OS before it started this program.

If that was the case, e.g. for a microcontroller, yes, each bit of storage might settle into a 0 or 1 state during the voltage fluctuations of powering on, except in storage engineered to power up in a certain state. (DRAM is more likely to be 0 on power-up, because its capacitors will have discharged). But you'd also expect there to be internal CPU logic that does some zeroing or setting of things to guaranteed state before fetching and executing the first instruction of code from the reset vector (a memory address); system designers normally arrange for there to be ROM at that physical address, not RAM, so they can put non-random bytes of machine-code there. Code that executes at that address should probably assume random values for all registers.

But you're writing a simple user-space program that runs under an OS, not the firmware for a microcontroller, embedded system, or mainstream motherboard, so power-up randomness is long in the past by the time anything loads your program.

Modern OSes zero registers on process startup, and zero memory pages allocated to user-space (including your stack space), to avoid information leaks of kernel data and data from other processes. So the values must come from something that happened earlier inside your process, probably from dynamic linker code that ran before main and used some stack space.

Reading the value of a local variable that's never been initialized or assigned is not actually undefined behaviour (in this case because it couldn't have been declared register int ghosts[3], that's an error (Godbolt) because ghosts[i] effectively uses the address) See (Why) is using an uninitialized variable undefined behavior? In this case, all the C standard has to say is that the value is indeterminate. So it does come down to implementation details, as you expected.

When you compile without optimization, compilers don't even notice the UB because they don't track usage across C statements. (This means everything is treated somewhat like volatile, only loading values into registers as needed for a statement, then storing again.)

In the example Godbolt link I added to your question, notice that -Wall doesn't produce any warnings at -O0, and just reads from the stack memory it chose for the array without ever writing it. So your code is observing whatever stale value was in memory when the function started. (But as I said, that must have been written earlier inside this program, by C startup code or dynamic linking.)

With gcc -O2 -Wall, we get the warning we'd expect: warning: 'ghosts' is used uninitialized [-Wuninitialized], but it does still read from stack space without writing it.

Sometimes GCC will invent a 0 instead of reading uninitialized stack space, but it happens not in this case. There's zero guarantee about how it compiles the compiler sees the use-uninitialized "bug" and can invent any value it wants, e.g. reading some register it never wrote instead of that memory. e.g. since you're calling printf, GCC could have just left ESI uninitialized between printf calls, since that's where ghost[i] is passed as the 2nd arg in the x86-64 System V calling convention.

Most modern CPUs including x86 don't have any "trap representations" that would make an add instruction fault, and even if it did the C standard doesn't guarantee that the indeterminate value isn't a trap representation. But IA-64 did have a Not A Thing register result from bad speculative loads, which would trap if you tried to read it. See comments on the trap representation Q&A - Raymond Chen's article: Uninitialized garbage on ia64 can be deadly.

The ISO C rule about it being UB to read uninitialized variables that were candidates for register might be aimed at this, but with optimization enabled you could plausibly still run into this anyway if the taking of the address happens later, unless the compiler takes steps to avoid it. But ISO C defect report N1208 proposes saying that an indeterminate value can be "a value that behaves as if it were a trap representation" even for types that have no trap representations. So it seems that part of the standard doesn't fully cover ISAs like IA-64, the way real compilers can work.

Another case that's not exactly a "trap representation": note that only some object-representations (bit patterns) are valid for _Bool in mainstream ABIs, and violating that can crash your program: Does the C standard allow for an uninitialized bool to crash a program?

That's a C question, but I verified that GCC will return garbage without booleanizing it to 0/1 if you write _Bool b[2] ; return b[0]; https://godbolt.org/z/jMr98547o. I think ISO C only requires that an uninitialized object has some object-representation (bit-pattern), not that it's a valid one for this object (otherwise that would be a compiler bug). For most integer types, every bit-pattern is valid and represents an integer value. Besides reading uninitialized memory, you can cause the same problem using (unsigned char*) or memcpy to write a bad byte into a _Bool.

CodePudding user response：

The registers of the memory after they are 'emptied' get random voltages between 0 and 1,

Nothing so mysterious. You are just seeing what was written to those memory locations last time they were used.

When memory is released it is not cleared or emptied. The system just knows that its free and the next time somebody needs memory it just gets handed over, the old contents are still there. Its like buying an old car and looking in the glove compartment, the contents are not mysterious, its just a surprise to find a cigarette lighter and one sock.

Sometimes in a debugging environment freed memory is cleared to some identifiable value so that its easy to recognize that you are dealing with uninitialized memory. For examples 0xccccccccccc or maybe 0xdeadbeefDeadBeef

Maybe a better analogy. You are eating in a self serve restaurant that never cleans its plates, when a customer has finished they put the plates back on the 'free' pile. When you go to serve yourself you pick up the top plate from the free pile. You should clean the plate otherwise you get what was left there by previous customer

CodePudding user response：

The array ghosts is uninitialized, and because it was declared inside of a function and is not static (formally, it has automatic storage duration), its values are indeterminate.

This means that you could read any value, and there's no guarantee of any particular value.