C: Is my understanding about the specifics of heap and stack allocation correct?-CodePudding

I have a sort of linked list implemented (code at bottom) in C (which has obvious issues, but I'm not asking about those or about linked lists; I'm aware for instance that there are no calls to free() the allocated memory) given below which does what I expect (so far as I've checked). My question is about the first couple of lines of the addnodeto() function and what it does to the heap/stack.

My understanding is that calling malloc() sets aside some memory on the heap, and then returns the address of that memory (pointing to the beginning) which is assigned to struct node *newnode which is itself on the stack. When the function is first called, *nodetoaddto is a pointer to struct node first, both of which are on the stack. Thus the (*nodeaddto)->next = newnode sets first.next equal to the value of newnode which is the address of the newly allocated memory.

When we leave this function, and continue executing the main() function, is *newnode removed from the stack (not sure if 'deallocated' is the correct word), leaving only struct node first pointing to the 'next' node struct on the heap? If so, does this 'next' struct node have a variable name also on the stack or heap, or it is merely some memory pointed too? Moreover, is it true to say that struct node first is on the stack, whilst all subsequent nodes will be on the heap, and that just before main() returns 0 there are no structs/variables on the stack other than struct node first? Or is/are there 1/more than 1 *newnode still on the stack?

I did try using GDB which showed that struct node *newnode was located at the same memory address both times addnodeto() was called (so was it removed and then happened to be re-defined/allocated in to the same location, or was perhaps the compiler being smart and left it there even once the function was exited the first time, or other?), but I couldn't work anything else out concretely. Thank you.

The code:

#include <stdio.h>
#include <stdlib.h>

#define STR_LEN 5

struct node {
    char message[STR_LEN];
    struct node *next;
};

void addnodeto(struct node **nodeaddto, char letter, int *num_of_nodes){

    struct node *newnode = malloc(sizeof(struct node));
    (*nodeaddto)->next = newnode;
    newnode->message[0] = letter;

    (*nodeaddto) = newnode;

    *num_of_nodes  = 1;
}

int main(void){
    struct node first = {"F", NULL};
    struct node *last = &first;
    int num_nodes = 1;

    addnodeto(&last, 'S', &num_nodes);
    addnodeto(&last, 'T', &num_nodes);
    addnodeto(&last, 'I', &num_nodes);

    printf("Node: %d holds the char: %c\n", num_nodes-3, first.message[0]);
    printf("Node: %d holds the char: %c\n", num_nodes-2, (first.next)->message[0]);
    printf("Node: %d holds the char: %c\n", num_nodes-1, ((first.next)->next)->message[0]);
    printf("Node: %d holds the char: %c\n", num_nodes, (last)->message[0]);

    return 0;
}

Which when run outputs:

Node: 1 holds the char: F
Node: 2 holds the char: S
Node: 3 holds the char: T
Node: 4 holds the char: I

As expected.

CodePudding user response：

My understanding is that calling malloc() sets aside some memory on the heap, and then returns the address of that memory (pointing to the beginning)…

Yes, but people who call it “the heap” are being sloppy with terminology. A heap is a kind of data structure, like a linked list, a binary tree, or a hash table. Heaps can be used for things other than tracking available memory, and available memory can be tracked using data structures other than a heap.

I do not actually know of a specific term for the memory that the memory management routines manage. There are actually several different sets of memory we might want terms for:

all the memory they have acquired from the operating system so far and are managing, including both memory that is currently allocated to clients and memory that has been freed (and not yet returned to the operating system) and is available for reuse;
the memory that is currently allocated to clients;
the memory that is currently available for reuse; and
the entire range of memory that is being managed, including portions of the virtual address space reserved for future mapping when necessary to request more memory from the operating system.

I have seen “pool” used to describe such memory but have not seen a specific definition of it.

… which is assigned to struct node *newnode which is itself on the stack.

struct node *newnode is indeed nominally on the stack in common C implementations. However, the C standard only classifies it as automatic storage duration, meaning its memory is automatically managed by the C implementation. The stack is the most common way to implement that, but specialized C implementations may do it in other ways. Also, once the compiler optimizes the program, newnode might not be on the stack; the compiler might generate code that just keeps it in a register, and there are other possibilities too.

A complication here is when we are talking about memory use in a C program, we can talk about the memory use in a model computer the C standard uses to describe the semantics of programs or the memory use in actual practice. For example, as the C standard describes it, every object has some memory reserved for it during its lifetime. However, when a program is compiled, the compiler can produce any code it wants that gets the same results as required by the C standard. (The output of the program has to be the same, and certain other interactions have to behave the same.) So a compiler might not use memory for an object at all. After optimization, an object might be in memory at one time and in registers at another, or it might always be in a register and never in memory, and it might be in different registers at different times, and it might not be any particular place because it might have been incorporated into other things. For example, in int x = 3; printf("%d\n", 4*x 2);, the compiler might eliminate x completely and just print “14”. So, when asking about where things are in memory, you should be clear about whether you want to discuss the semantics in the model computer that the C standard uses or the actual practice in optimized programs.

When the function is first called, *nodetoaddto is a pointer to struct node first, both of which are on the stack.

nodetoaddto may be on the stack, per above, but it also may be in a register. It is common that function arguments are passed in registers.

It points to a struct node. By itself, struct node is a type, so it is just a concept, not an object to point to. In contrast, “a struct node” is an object of that type. That object might or might not be on the stack; addnodeto would not care; it could link to it regardless of where it is in memory. Your main routine does create its first and last nodes with automatic storage duration, but it could use static just as well, and then the nodes would likely be located in a different part of memory rather than the stack, and addnodeto would not care.

Thus the (*nodeaddto)->next = newnode sets first.next equal to the value of newnode which is the address of the newly allocated memory.

Yes: In main, last is initialized to pointer to first. Then &last is passed to addnodeto, so nodeaddto is a pointer to last. So *nodeaddto is a pointer to first. So (*nodeaddto)->next is the next member in `first.

When we leave this function, and continue executing the main() function, is *newnode removed from the stack (not sure if 'deallocated' is the correct word), leaving only struct node first pointing to the 'next' node struct on the heap?

newnode is an object with automatic storage duration inside addnodeto, so its memory is automatically released when addnodeto ends.

*newnode is a struct node with allocated storage duration, so its memory is not released when a function ends. Its memory is released when free is called, or possibly some other routine that may release memory, like realloc.

If so, does this 'next' struct node have a variable name also on the stack or heap, or it is merely some memory pointed [to]?

There are no variable names in the stack or in the heap. Variable names exist only in source code (and in the compiler while compiling and in debugging information associated with the compiled program, but that debugging information is generally separate from the normal execution of the program). When we work with allocated memory, we generally work with it only by pointers to it.

Moreover, is it true to say that struct node first is on the stack, whilst all subsequent nodes will be on the heap,…

Yes, subject to the caveats about stack and “heap” above.

… and that just before main() returns 0 there are no structs/variables on the stack other than struct node first?

All of the automatic objects in main are on the stack (or otherwise automatically managed): first, last, and num_nodes.

Or is/are there 1/more than 1 *newnode still on the stack?

No.