When discussing the 32-bit MIPS architecture, Patterson--Hennessy explain that the static data segment starts at 0x 1000 0000
, ends at 0x 1000 FFFF
, with the global pointer $gp
set by default to the middle address 0x 1000 8000
. It is stated that the heap is next, and should thus start at 0x 1001 0000
.
Some experimenting with MARS however tells me that there is an additional segment lying in between, which goes from 0x 1001 0000
to 0x 1003 FFFF
, so that the heap only starts at 0x 1004 0000
. Indeed when I store say an array on the heap using a syscall, this array will be stored in 0x 1004 0000
onwards.
This additional segment seems to get used when I initialise data under the .data
header of my program. This confuses me, as I was under the expectation that data initialised under .data
was to be considered static, and should therefore be stored in the segment governed by the global pointer.
Question. Is the behaviour exhibited by MARS standard? If yes, in what way does this additional data segment, lying between the static data and the heap, differ from the static data segment lying in front of it?
CodePudding user response:
You can't take these simulators too seriously.
There's no reason I know of why you can't move things around in memory. The linker needs to know where the global data symbols are located so it can do relocations, but otherwise, the processor doesn't care.
To illustrate, MARS has a Memory Configuration option in Settings, so, for example, you can set up the simulator as if an embedded processor with limited memory, like 64k.
On a real system, the global data will be loaded by the operating system program loader from the program executable file, and the data section will be enlarged by the .bss
amount (which is initially zeroed). And then, typically, the heap will start at the next page boundary.
The simulator doesn't do these details — it is working with a fixed memory model for all programs; so, for example, it doesn't readjust the heap start location based on the actual number of global variables in the assembly, as a real system would do.
So, in my opinion, it is simply reserving 256k for global storage for the assembly program to use.
The first 64k are reserved for direct access using $gp
, but MARS doesn't place any global data there by default, and if you force the data to start at 0x10000000 (via .data 0x10000000
), you can put symbols there, but doing lw $t0, label($gp)
where label
resides within 16-bit reach of $gp
is treated as a pseudo instruction (that expands to 3 instructions) that adds the absolute value of label
to what's in $gp
and then does the lw
— this will not correctly access that global data, b/c $gp
holds 0x10008000, so that will end up adding two pointers, when it should instead add the offset of label
relative to $gp
. If one really wanted to use the global data via the proper 1-instruction sequence using $gp
(with the default memory configuration model) you might define constants using .eqv
instead of defining labels, and manage the offsets yourself (yuk, but will work for small programs).
Instead, MARS puts the default start location for .data
at 0x10010000, which I take as simply avoiding the 64k $gp
accessible area. Any instruction that uses a data label is expanded to use a multiple instruction sequence that starts with an lui
.
To reiterate, in my opinion, MARS is simply reserving 256k for global storage for the assembly program to use, and then starting the heap there.
I don't interpret any of this as another section or segment in between data and heap.
Also, if you read the MARS default memory configuration model, it is says the stack (lower) limit is the same as the heap base, but in reality the MARS simulator will not (or cannot) actually allow the stack to grow that big, it will issue an error at a much higher stack location, the real limit being something like 1 or 2 MB in size.