Home > Mobile >  Reliably and portably store and retrieve objects of structure type in C
Reliably and portably store and retrieve objects of structure type in C

Time:05-09

@bdonlan,in Copying structure in C with assignment instead of memcpy(), lists several reasons for using memcpy to copy objects of structure type. I have one more reason: I want to use the same area of memory to store and retrieve arbitrary objects—of possibly different structure type—at different times (like storage on a pre-allocated heap).

I want to know:

  • how this can be done portably (in the sense that the behavior defined by the Standard) and
  • what parts of the Standard allow me to reasonably assume that it can be done portably.

Here is an MRE (sorta: not so much on the "M" [minimal] and I'm basically asking about the "R" [reproducible]):

// FILE: memcpy_struct.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// EDIT: @john-bollinger POINTS OUT THAT THE FOLLOWING LINE
//  IS NOT PORTABLE.
// typedef struct { } structure ;
// INSTEAD:
typedef struct { char dummy ; } structure ;

typedef struct {
    unsigned long long u ; unsigned long long v ;
} unsignedLongLong2; // TWICE AS MANY BITS AS long long

typedef struct
{
    unsigned long long u ; unsigned long long v ;
    unsigned long long w ; unsigned long long x ;
} unsignedLongLong4; // FOUR TIMES AS MANY BITS AS long long

typedef unsigned char byte ;

void store ( byte * target , const structure * source , size_t size ) {
    memcpy ( target , source , size ) ;
}

void fetch ( structure * target , const byte * source , size_t size ) {
    memcpy ( target , source , size ) ;
}

const size_t enough =
    sizeof ( unsignedLongLong2 ) < sizeof ( unsignedLongLong4 )
    ? sizeof ( unsignedLongLong4 ) : sizeof ( unsignedLongLong2 ) ;

int main ( void )
{
    byte * memory = malloc ( enough ) ;
    unsignedLongLong2 v0 = { 0xabacadabaabacada , 0xbaabacadabaabaca } ;
    unsignedLongLong4 w0= {
        0xabacadabaabacada , 0xbaabacadabaabaca ,
        0xdabaabacadabaaba , 0xcadabaabacadabaa } ;
    unsignedLongLong2 v1 ;
    unsignedLongLong4 w1 ;
    store ( memory ,   ( structure * ) & v0 ,   sizeof v0 ) ;
    fetch ( ( structure * ) & v1 ,   memory ,   sizeof v1 ) ;
    store ( memory ,   ( structure * ) & w0 ,   sizeof w0 ) ;
    fetch ( ( structure * ) & w1 ,   memory ,   sizeof w1 ) ;
    char s [ 1   sizeof w0 * CHAR_BIT ] ; // ENOUGH FOR TERMINATING NULL CHAR-
    char t [ 1   sizeof w0 * CHAR_BIT ] ; // ACTERS   BASE-2 REPRESENTATION.
    sprintf ( s, "%llx-%llx",  v0 . u,  v0 . v ) ;
    sprintf ( t, "%llx-%llx",  v1 . u,  v1 . v ) ;
    puts ( s ) ;   puts ( t ) ;
    puts ( strcmp ( s , t ) ? "UNEQUAL" : "EQUAL" ) ;
    sprintf ( s, "%llx-%llx-%llx-%llx",  w0 . u,  w0 . v,  w0 . w,  w0 . x ) ;
    sprintf ( t, "%llx-%llx-%llx-%llx",  w1 . u,  w1 . v,  w1 . w,  w1 . x ) ;
    puts ( s ) ;   puts ( t ) ;
    puts ( strcmp ( s , t ) ? "UNEQUAL" : "EQUAL" ) ;
    free ( memory ) ;
}

Compiled with

gcc -std=c11 memcpy_struct.c # can do C99 or C17, too

Output of corresponding executable

abacadabaabacada-baabacadabaabaca
abacadabaabacada-baabacadabaabaca
EQUAL
abacadabaabacada-baabacadabaabaca-dabaabacadabaaba-cadabaabacadabaa
abacadabaabacada-baabacadabaabaca-dabaabacadabaaba-cadabaabacadabaa
EQUAL

But what guarantees that the pairs of outputs will always be EQUAL, provided that the Standard is respected? I think the following helps (N2176 Types 6.2.5-28):

All pointers to structure types shall have the same representation and alignment requirements as each other.

CodePudding user response:

If you are asking about some way to “store” a structure and later recover the same structure into an object of the same type, then it suffices merely to copy the bytes. This can be done by memcpy, and there is no need for any kludges using structures defined with various numbers of unsigned long long elements.1 This is guaranteed by C 2018 6.2.6.1 paragraphs 2 to 4:

2 Except for bit-fields, objects are composed of contiguous sequences of one or more bytes, the number, order, and encoding of which are either explicitly specified or implementation-defined.

3 Values stored in unsigned bit-fields and objects of type unsigned char shall be represented using a pure binary notation.

4 Values stored in non-bit-field objects of any other object type consist of n × CHAR_BIT bits, where n is the size of an object of that type, in bytes. The value may be copied into an object of type unsigned char [n] (e.g., by memcpy); the resulting set of bytes is called the object representation of the value…

So, to store any structure, or any object other than a bit-field, reserve enough memory for it2 and copy the object’s bytes into that memory. To recover the structure, copy the bytes back.

Regarding:

I think the following helps (N2176 Types 6.2.5-28):

All pointers to structure types shall have the same representation and alignment requirements as each other.

That is irrelevant. No representation of any pointer is used in the code int the question, so their representations (what bytes make up the recorded value for a pointer) are irrelevant.

Footnotes

1 Why use multiple members with different names? To define a structure with N unsigned long elements in it, all you need is struct { unsigned long long x[N]; }.

2 For an object X, this can be done with void * Memory = malloc(sizeof X) or, if your compiler supports variable length arrays, with unsigned char Memory[sizeof X];, or, if you want it inside a structure, struct { unsigned char x[sizeof X]; } Memory;.

CodePudding user response:

Since you are asking about portability and the provisions of the standard, the very first thing that came to mind was that structure types without any members, such as this ...

typedef struct { } structure ;

... are a non-portable extension. Your objective there seems to be to us structure * as a generic pointer-to-structure type, but you don't need that when you have void * available as a generic pointer-to-anything type. And with void *, you even get the pointer conversions automatically, without the explicit casts. Note also that you eventually get the conversions to void * anyway when you call memcpy().

You should also be aware that although your comparisons with strcmp are valid, given that you are ensuring a terminating null byte, they do not reliably perform the comparison you really want, as the comparison will stop at the first internal null byte within your structures' representations if there is one. There may be null bytes within the members' representations, depending on their values, and if the structures were laid out with any padding then there could be null bytes there, too.

Probably you want memcmp() instead, but here you should remain aware that padding is a potential issue. The value of any padding in your structure types' representations are relevant to memcmp(), but they are not part of the semantic values of your structures. For example, padding values are not guaranteed to be copied by structure assignment.

I want to use the same area of memory to store and retrieve arbitrary objects—of possibly different structure type—at different times (like storage on a pre-allocated heap).

Ok. That's not a particularly big ask.

I want to know:

  • how this can be done portably (in the sense that the behavior defined by the Standard) and

Your example is fine. Alternatively, if you know in advance all the different structure types that you may want to store, then you can use a union.

  • what parts of the Standard allow me to reasonably assume that it can be done portably.

With your dynamic allocation / memcpy() example, there is

  • C17 7.22.3.4/2: "The malloc function allocates space for an object whose size is specified by size"

  • C17 6.2.4/2: "An object exists, has a constant address, and retains its last-stored value throughout its lifetime."

  • C17 7.22.3/1: "The lifetime of an allocated object extends from the allocation until the deallocation."

  • C17 7.24.2.1/3: "The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1."

Thus, in a program exhibiting only defined behavior, memcpy() faithfully copies all the specified bytes from the source object to the destination object. That object retains them unchanged until and unless either they are overwritten or the end of its lifetime. That keeps them available for the second memcpy() to copy them from there to some other object. Neither memcpy alters the byte sequence, and the allocated object faithfully keeps them in between, so in the end, all three objects -- the original, the allocated, and the final destination, must contain the same byte sequence, up to the number of bytes copied.

  • Related