struct inside struct : to point or not to point?-CodePudding

I'd like to understand the difference between using a pointer and a value when it comes to referencing a struct inside another struct.

By that I mean, I can have those two declarations:

struct foo {
    int bar;
};

struct fred {
    struct foo  barney;
    struct foo *wilma;
}

It appears I can get the same behavior from both barney and wilma entries, as long as I de-reference accordingly when I access them. The barney case intuitively feels “wrong” but I cannot say why.

Am I just relying on some C undefined behavior? If not, what would be the reason(s) to opt for one style over the other?

The following code shows how I come to the conclusion both use cases are equivalent; neither clang nor gcc complain about anything.

#include <stdio.h>
#include <stdlib.h>

struct a_number {
    int i;
};

struct s_w_ptr {
    struct a_number *n;
};

struct s_w_val {
    struct a_number n;
};

void store_via_ptr(struct s_w_ptr *swp, struct s_w_val *swv) {
    struct a_number *i = malloc(sizeof(i));
    i->i   =  1;
    swp->n =  i;
    swv->n = *i;
}

void store_via_val(struct s_w_ptr *swp, struct s_w_val *swv) {
    struct a_number j;
    j.i    =  2;
    swp->n = &j;
    swv->n =  j;
}

int main(void) {

    struct s_w_ptr *swp = malloc(sizeof(swp));
    struct s_w_val *swv = malloc(sizeof(swv));

    store_via_ptr(swp, swv);
    printf("p: %d | v: %d\n", swp->n->i, swv->n.i);

    store_via_val(swp, swv);
    printf("p: %d | v: %d\n", swp->n->i, swv->n.i);
}

CodePudding user response：

It's perfectly valid to have both struct members in a struct and have pointers to struct in a struct. They must be used differently but both are legal.

Why have a struct in a struct ?

One reason is to group things together. For instance:

struct car
{
    struct motor motor;  // a struct with several members describing the motor
    struct wheel wheel;  // a struct with several members describing the wheels
    ...
}

struct car myCar = {....initializer...};

myCar.wheel = SomeOtherWheelModel;  // Replace wheels in a single assign
myCar.wheel.pressure = 2.1;         // Change a single wheel member

Why have a struct pointer in a struct?

One very obvious reason is that is can be used as an array of N structs by using dynamic allocation of N times the struct size.

Another typical example is linked lists where you have a pointer to a struct of the same type as the struct containing the pointer.

CodePudding user response：

C structures can be used to group related data, such as the title of a book, its author, its assigned book number, and so on. But much of what we use structures for is creating data structures (in a different sense of the word “structure”) in memory.

Consider that the book’s author has a name, a date of birth, other biographical information, a list of books they have written, and more. We could include in the struct book a struct author that would contain all this information. But, if the author has written a hundred books, we could have 100 copies of all that information, one copy in each struct book. Further, we cannot continue the “contain the data inside the structure directly” model with the struct author, because it cannot contain a struct book for each book the author publishes if those struct book members also have to contain the struct author for the author—every object would have to contain itself.

It is more efficient to create one struct author and have each struct book for that author to link to their struct author.

Another example is that we use pointers to create data structures for efficient access to data. If we are reading data for thousands of items and want to keep them sorted by name, one option is to allocate memory for some number of structures, read the data, and sort the data. When new data is read and we have used all the memory we allocated, we allocate new memory, copy all the old data to the new memory if necessary, and move some of the data so we can insert the new data in its proper place. However, we have many better options than that. We can use linked lists, binary trees, other kinds of trees, and hash tables.

These data structures effectively require using pointers. A binary tree will have a root node, and each node contains two pointers, one to a subtree of nodes that are earlier than it in the sorting order and another to a subtree of nodes that are later than it. We can look up items in the tree by following pointers to earlier or later nodes to find the right position. And we can insert items by changing a few pointers. If the tree happens to become unbalanced, we can rearrange nodes in the tree by changing pointers. The bulk of the data in the nodes does not have to be changed or copied, just some pointers.

We can also use pointers to have multiple structures for the same data. All the data about books could be stored in one place, and a tree ordered by name could contain nodes in which each node contained a pointer to the book structure and two pointers to subtrees. We could have one tree like this ordered by title of the book and another tree ordered by the name of the author and another tree ordered by the assigned book number. Then we can efficiently look up a book by title or author or number, but there is only one master copy of the complete book data, in the struct book objects. The look-up data is in the tree, which contains only pointers. That is much more efficient than copying all of the struct book data for each tree.

So the reasons we choose between use structures or pointers as members is not whether the C syntax allows us to refer to the data or not—we can get to the data in both cases. The reasons are because one method requires embedding data, which is inflexible and requires copying data, and the other method is flexible and efficient.

CodePudding user response：

There are several advantages of having a struct in a struct instead of having a pointer to struct in a struct:

It requires less memory allocation. In the case where you have a pointer to a struct in a struct, the compiler will allocate memory to store the pointer to the struct within the parent struct and separately allocate the memory for the child struct.
Additional instructions are typically required to access the contents of the child struct. For example consider that the program is reading the contents of the child struct. If a struct within a struct is used, the program will apply an offset to the address of the variable and read the contents of that memory location. In the case of a pointer to a struct in a struct, the program will actually apply an offset to the parent struct variable address, fetch the address of the child struct, then read from memory the contents of the child struct.
A separate variable needs to be declared for both the parent and child struct and if an initializer is used, then a separate initializer is needed. In the case of a struct in a struct only one variable must be declared and a single initializer is used.
In cases where dynamic memory allocation is used, the developer must remember to deallocate memory for both the child and parent objects before the variables fall out of scope. In the case of struct in a struct the memory must be freed for only one variable.
Lastly, as is shown in the example, if a pointer is used, Null checking may be necessary to ensure that the pointer to the child struct has been initialized.
Any changes made directly to the child struct would affect the parent struct, which could be a problem if this behavior is not desired.

The primary advantages of having a struct in a struct would be if you needed to replace the child struct with another struct within the program, such as a linked list. A less common case might be if the child struct can be of more than one type. In this case you might use a void * type for the child. I may also use a pointer within a struct to point to an array in case where the array pointed to may vary in size between instances.

Based on my knowledge the case shown in the example above, I would be inclined to use a struct in a struct, since both objects are of fixed size and type and since it appears that they would not need to be separated.

CodePudding user response：

Let's consider at first this function

void store_via_ptr(struct s_w_ptr *swp, struct s_w_val *swv) {
    struct a_number *i = malloc(sizeof(i));
    i->i   =  1;
    swp->n =  i;
    swv->n = *i;
}

This declaration

struct a_number *i = malloc(sizeof(i));

is equivalent to the following declaration

struct a_number *i = malloc(sizeof( struct a_number * ));

So in general the function can invoke undefined behavior when sizeof( struct a_number ) is greater than sizeof( struct a_number * ).

It seems you mean

struct a_number *i = malloc(sizeof( *i ) );
                                    ^^^

If you will split the function in two functions for each its parameter like

void store_via_ptr1( struct s_w_ptr *swp ) {
    struct a_number *i = malloc(sizeof( *i ) );
    i->i   =  1;
    swp->n =  i;
}

and

void store_via_ptr( struct s_w_val *swv ) {
    struct a_number *i = malloc(sizeof( *i));
    i->i   =  1;
    swv->n = *i;
}

then in the first function the object pointed to by the pointer swp will need to remember to free the allocated memory within the function. Otherwise there will be a memory leak.

The second function already produces a memory leak because the allocated memory was not freed.

Now let's consider the second function

void store_via_val(struct s_w_ptr *swp, struct s_w_val *swv) {
    struct a_number j;
    j.i    =  2;
    swp->n = &j;
    swv->n =  j;
}

Here the pointer swp->n will point to a local object j. So after exiting the function this pointer will be invalid because the pointed object will not be alive.

So the both functions are incorrect. Instead you could write the following functions

int store_via_ptr(struct s_w_ptr *swp ) {
    swp->n = malloc( sizeof( *swp->n ) );

    int success = swp->n != NULL;

    if ( success ) swp->n->i = 1;

    return success;
}

and

void store_via_val( struct s_w_val *swv ) {
    swv->n.i =  2;
}

When to include a whole object of a structure type in another object of a structure type or to use a pointer to an object of a structure type within other object of a structure type depends on the design and context where such objects are used.

For example consider a structure struct Point

struct Point
{
    int x;
    int y;
};

In this case if you want to declare a structure struct Rectangle then it is natural to define it like

struct Rectangle
{
    struct Point top_left;
    struct Point bottom_right;
};

On the other hand, if you have a two-sided singly-linked list then it can look like

struct Node
{
    int value;
    struct Node *next;
};

struct List
{
    struct Node *head;
    struct Node *tail;
};

CodePudding user response：

Two problems:

In store_via_ptr you allocate memory for i dynamically. When you use s_w_val you copy the structure, and then leave the pointer. Which means the pointer will be lost and can't be passed to free later.
In store_via_val you make swp->n point to the local variable j. A variable whose life-time will end when the function returns, leaving you with an invalid pointer.

The first problem might lead to a memory leak (something you never care about in your simple example problem).

The second problem is worse, since it will lead to undefined behavior when you dereference the pointer swp->n.

Unrelated to that, in the main function you don't need to allocate memory dynamically for the structures. You could just have defined them as plain structure objects and used the pointer-to operator & when calling the functions.