Home > front end >  C, Do structs with multiple dynamically allocated arrays need to be resized for every allocation?
C, Do structs with multiple dynamically allocated arrays need to be resized for every allocation?

Time:05-06

Below I have a struct with multiple dynamically allocated char arrays.

It compiles, Valgrind indicates no issues and it functions as anticipated.

Earlier, someone tried to explain to me I may encounter memory issues down the road. Their reasoning was that one of the variables may exceed the contagious memory allocated for the foo instance.

Should I be performing a malloc on foo every time I allocate memory for its variables?

Live Demo

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

typedef struct foo_t{
    char *strA;
    char *strB;
    char *strC;
    int x;
} foo;

int main() {

    char *str1 = "Hello World";
    char *str2 = "foo";
    char *str3 = "Kolmongorov complexity";

    foo *point = malloc(sizeof(foo));
    point->x = 100000;

    point->strA = calloc(strlen(str1) 1, sizeof(char));
    point->strB = calloc(strlen(str2) 1, sizeof(char));
    point->strC = calloc(strlen(str3) 1, sizeof(char));

    strcpy(point->strA, str1);
    strcpy(point->strB, str2);
    strcpy(point->strC, str3);

    free(point->strA);
    free(point->strB);
    free(point->strC);

    free(point);

    return 0;
}

CodePudding user response:

In your example, the foo struct doesn't really contains the arrays : it contains pointers to arrays.

When arrays are resized, the size of pointers remains the same, so there's no need to reallocate the foo struct.

Just be aware that resizing an array with realloc may change its memory address, so pointers in foo struct will have to be reassigned according to that.

CodePudding user response:

I think memory management is one of the most important topic in C, but it's far more confusing than you may thought at first. This question can be a great entrypoint to illustrate it, so I will try my best to explain it in detail.

First consider the following three C struct definitions:

#define STR_SIZE 16

struct foo_A {
    char str1[STR_SIZE];
    char str2[STR_SIZE];
};

struct foo_B {
    char *str1;
    char *str2;
};

struct foo_C {
    char str1[STR_SIZE];
    char str2[];
};

Note that foo_C is legal, not a typo. What' their difference?

Well, let's consider foo_A and foo_B first. It looks like both of them try to describe and manage two strings. So for memory management, the first question you have to think is where the memory comes from.

In general there are two ways to get the memory, one is "I will allocate it by myself" and the other is "Someone else will allocate it, I just take it and manage". As you might guess, the former in-place style is foo_A, while the latter is foo_B.

More specifically, in C, you will use them in different styles(headers are omitted):

char str1[] = "I'm string A";
char str2[] = "I'm string B";

int main() {
 
    struct foo_A foo_A;
    struct foo_B foo_B;
    
    memset(&foo_A, 0, sizeof(struct foo_A));
    memset(&foo_B, 0, sizeof(struct foo_B));

    foo_B.str1 = (char *)malloc(sizeof(str1));
    foo_B.str2 = (char *)malloc(sizeof(str2));

    strcpy(foo_A.str1, str1);
    strcpy(foo_A.str2, str2);
    strcpy(foo_B.str1, str1);
    strcpy(foo_B.str2, str2);

    /* 
        safely manipulate foo_B strings
        ...
    */

    free(foo_B.str1);
    free(foo_B.str2);
}

As you see, compared to foo_A's out-of-box usage, we have to deal with memory allocation before access foo_B's strings. If you omit these steps, the program just crashes.

What causes such difference, more formally, lies that foo_A itself contains the resource it is to manage, while foo_B doesn't. foo_B just contains a handle to the resource it is to manage, and before you use foo_B, you have to first do have a resource, and attach it to the handle. Of course the allocation/release of such resource is not the job of foo_B.

Before jump back to your reallocation confusion, another problem remains to deal with: management of management struct itself.

As you see, the code above doesn't allocate/free struct foo_A or struct foo_B besides some memset erasing. This is not always the case, while I choose this way to make the code clear and avoid introducing this problem too early.

Now consider the following two alternatives:

/* 
    alternative_1.c
    
    str1, str2 remain the same
*/

struct foo_A foo_A;
struct foo_B foo_B;

int main() {

    foo_B.str1 = (char *)malloc(sizeof(str1));
    foo_B.str2 = (char *)malloc(sizeof(str2));

    strcpy(foo_A.str1, str1);
    strcpy(foo_A.str2, str2);
    strcpy(foo_B.str1, str1);
    strcpy(foo_B.str2, str2);

    /* 
        safely manipulate foo_B strings
        ...
    */

    free(foo_B.str1);
    free(foo_B.str2);
}
/* 
    alternative_2.c
    
    str1, str2 remain the same
*/
int main() {

    struct foo_A *foo_A = (struct foo_A *)malloc(sizeof(struct foo_A));
    struct foo_B *foo_B = (struct foo_B *)malloc(sizeof(struct foo_B));

    memset(foo_A, 0, sizeof(struct foo_A));
    memset(foo_B, 0, sizeof(struct foo_B));

    foo_B->str1 = (char *)malloc(sizeof(str1));
    foo_B->str2 = (char *)malloc(sizeof(str2));

    strcpy(foo_A->str1, str1);
    strcpy(foo_A->str2, str2);
    strcpy(foo_B->str1, str1);
    strcpy(foo_B->str2, str2);

    /* 
        safely manipulate foo_B strings
        ...
    */

    /* free managed string */
    free(foo_B->str1);
    free(foo_B->str2);
    
    /* free management struct itself */
    free(foo_A)
    free(foo_B)
}

As you see, the way we get foo_A and foo_B varies. In the very first example, we define them within function body, which means they are automatically allocated on stack(case 1). In alternative_1.c, we define them as static variable whose space is allocated at compile time(case 2), while in alternative_2.c they are allocated on heap by malloc(case 3).

No matter how you get foo_A and foo_B, the key point is the same as before: you have to first do get a it. And manage them. You can free them manually(case 3), or relying on automatic varialbe reclaim(case 1), or at process exiting(case 2).

After all these discussions, you may find that the management of resource(e.g. string) and the management of management resource(e.g. foo_A or foo_B) are independent. So if your resouce changed like, just as you say, the string is resized by reallocation, you don't have to reallocate your management resource(i.e. foo_X). That' s it.

Remember there is still a struct foo_C? Before talking about it I want to first discuss why you may choose struct foo_A or struct foo_B.

It turns out that foo_A cannot manage variable-length strings, but just fix-length strings(in our example < 15). If you know the strings foo_A will manage never exceed a fixed small size, you may choose such style, otherwise you choose style like foo_B.

foo_C can be regarded as a mixture of both, variable-length but in-place allocation. It can be used as following:

/* 
    foo_C.c
    
    str1, str2 remain the same
*/
#define BUF_SIZE 1024

int main() {

    struct foo_C *foo_C = (struct foo_C *)malloc(BUF_SIZE);
    
    memset(foo_C, 0, BUF_SIZE);

    strcpy(foo_C->str1, str1);
    strcpy(foo_C->str2, str2);

    /* 
        safely manipulate foo_C strings
        ...
    */

    free(foo_C);
}

You get foo_C from malloc rather than define it on stack or as static variable, so as to make foo_C of variable size. You access strings of foo_C directly without first allocating space for them. A key difference between foo_C and foo_A is that, str2 of the former can be of variable length, while that of the latter is fixed. As for variable-length, foo_C is also differnt from foo_B. For foo_C, the variable string space is always allocated with foo_C, while for foo_B, it can be allocated anywhere on heap and you must attach it manually.

Note if you just define foo_C on stack or as static variable, then str2 of it looks like non-existent. That is, sizeof(struct foo_C) equals STR_SIZE. You should never access str2, which may behaves like array out-of-bound.

As far as I know, only kernel code which manages slab allocator makes use of such hack.

CodePudding user response:

Their reasoning was that one of the variables may exceed the contagious memory allocated for the foo instance.

That's nonsense, whoever told you as much didn't understand how allocation works. The struct only allocates the pointers, not what they point at. It will have the very same size, always. The pointers may point at memory allocated anywhere, not necessarily even on the heap.

Should I be performing a malloc on foo every time I allocate memory for its variables?

You need to allocate memory for it once, before accessing the pointer members. But in case the data pointed at by the pointer members are changed, then that doesn't affect the struct and it need not get reallocated.

  • Related