Home > Software engineering >  Reading binary data from a .bin file into structs in C
Reading binary data from a .bin file into structs in C

Time:11-04

I have a set of .bin files containing data in a formally specified format. I know exactly how many bytes there are for each field e.g. name = 40 bytes, version number = 2 bytes etc. I also know the exact order they are stored in the file (e.g. name, then version number....).

So far I can load the data from a file into an std::vector<unsigned char> list, then step through that data and read the fields in as per the number of expected bytes.

The issue is that this method is very long and error prone should I get any of the fields wrong (there's alot of different fields).

I've looked at and talked to people about struct packing, pointer casting and bit fields. I just can't seem to get them all to work together.

How can I read the data into my buffer, then 'overlay' my struct on the buffer? Then all the fields would populate according to the allocated bit fields I've given each value in the struct.

The issue with bit fields is that I can't take in strings.

Advice or example code would be highly appreciated. If you'd like just comment and I can give you code to show what I have so far and what I'm trying to achieve.

#include <vector>

int main()
{
    //File data loaded by function call
    std::vector<unsigned char> fileData;

    //How do I cast fileData to be a dataFields type? 
}

struct dataFields 
{
    int ID : 8;
    // Cannot use bit field for string type? 
    std::string name;
    int versionNumber : 16;
    int someOtherValue : 8;
}

I cannot give the exact code I'm working on for work reasons but I feel this sumarises what I'm trying to do fairly well in a simple manor.

CodePudding user response:

No, you indeed cannot use bit pattern for std::string, you wouldn't want to anyway since it contains just a few pointers.

The usual approach I use in my projects is having POD structs for each record type. Then the lowest layer responsible for {de}serialization converts only between PODs and bytes. Any C logic, like std::string or variable-length std::vector are dealt with at higher levels.

#include <array>
#include <type_traits>
#include <cstdint>
#include <cstring>

struct Record{
    std::uint8_t ID;
    std::array<char,40> name;
    std::uint16_t versionNumber;
    std::uint8_t someOtherValue;
};

static_assert(sizeof(Record)==46);
static_assert(offsetof(Record,name)==1);

In my world, I try to have the Record respect the standard alignement to sizeof(E) for each element. You can add packed modifiers if needed. Prefer types from <cstdint> before bitfields.

I recommend putting a bunch of static_asserts after each Record, verifying its layout. Otherwise someone will one day come along and try to "clean up" the code, breaking everything. It also nicely documents the protocol for the reader.

One downside is that this does not easily support putting variable-length members in the middle or having multiple of them, but I never had the need to do so, keep packets simple.

Also I just decide on fixed endianess for the protocol. If someone needs something else, it's their responsibility to pass correctly encoded Records for serialization.

Serialization helpers:

template<typename T>
T read_value(const unsigned char*& ptr){
    static_assert(std::is_standard_layout_v<T>);

    T value;
    std::memcpy(&value,ptr,sizeof(T));
    ptr =sizeof(T);
    return value;
}

template<typename T>
void write_value(unsigned char*& ptr, const T& value){
    static_assert(std::is_standard_layout_v<T>);

    std::memcpy(ptr,&value,sizeof(T));
    ptr =sizeof(T);
}

The lowest layer responsible for {de}serialization can look something like this:

void deserialize_stream(const unsigned char* bytes){\
    // Output is bunch of POD types.
    auto record1 = read_value<Record>(bytes);
    auto record2 = read_value<Record>(bytes);
}

void serialize_stream(unsigned char* bytes){
    // Input is a list of POD types to serialize.
    Record record1{1,"Foo",12,42};
    Record record2{2,"Bar",14,28};

    write_value(bytes,record1);
    write_value(bytes,record2);
}

Example

int main() { 
    // Just a example, CHECK SIZE in real world.
    std::array<unsigned char,1024> buffer;

    serialize_stream(buffer.data());
    deserialize_stream(buffer.data());

}

CodePudding user response:

Consider using a serialization library to do this if this part is not time/storage efficiency bounded. Those libraries can serialize your objects into XML or JSON and deserialize it easily. You do not need to concern about endianness or POD problems.

  • Related