How should I approach parsing the network packet using C template?-CodePudding

Let's say I have an application that keeps receiving the byte stream from the socket. I have the documentation that describes what the packet looks like. For example, the total header size, and total payload size, with the data type corresponding to different byte offsets. I want to parse it as a struct. The approach I can think of is that I will declare a struct and disable the padding by using some compiler macro, probably something like:

struct Payload
{
   char field1;
   uint32 field2;
   uint32 field3;
   char field5;
} __attribute__((packed));

and then I can declare a buffer and memcpy the bytes to the buffer and reinterpret_cast it to my structure. Another way I can think of is that process the bytes one by one and fill the data into the struct. I think either one should work but it is kind of old school and probably not safe.

The reinterpret_cast approach mentioned, should be something like:

void receive(const char*data, std::size_t data_size)
{
    if(data_size == sizeof(payload)
    {
        const Payload* payload = reinterpret_cast<const Payload*>(data);
       // ... further processing ...
    }
}

I'm wondering are there any better approaches (more modern C style? more elegant?) for this kind of use case? I feel like using metaprogramming should help but I don't have an idea how to use it.

Can anyone share some thoughts? Or Point me to some related references or resources or even relevant open source code so that I can have a look and learn more about how to solve this kind of problem in a more elegant way.

CodePudding user response：

There are many different ways of approaching this. Here's one:

Keeping in mind that reading a struct from a network stream is semantically the same thing as reading a single value, the operation should look the same in either case.

Note that from what you posted, I am inferring that you will not be dealing with types with non-trivial default constructors. If that were the case, I would approach things a bit differently.

In this approach, we:

Define a read_into(src&, dst&) function that takes in a source of raw bytes, as well as an object to populate.
Provide a general implementation for all arithmetic types is provided, switching from network byte order when appropriate.
Overload the function for our struct, calling read_into() on each field in the order expected on the wire.

#include <cstdint>
#include <bit>
#include <concepts>
#include <array>
#include <algorithm>

// Use std::byteswap when available. In the meantime, just lift the implementation from 
// https://en.cppreference.com/w/cpp/numeric/byteswap
template<std::integral T>
constexpr T byteswap(T value) noexcept
{
    static_assert(std::has_unique_object_representations_v<T>, "T may not have padding bits");
    auto value_representation = std::bit_cast<std::array<std::byte, sizeof(T)>>(value);
    std::ranges::reverse(value_representation);
    return std::bit_cast<T>(value_representation);
}

template<typename T>
concept DataSource = requires(T& x, char* dst, std::size_t size ) {
  {x.read(dst, size)};
};

// General read implementation for all arithmetic types
template<std::endian network_order = std::endian::big>
void read_into(DataSource auto& src, std::integral auto& dst) {
  src.read(reinterpret_cast<char*>(&dst), sizeof(dst));

  if constexpr (sizeof(dst) > 1 && std::endian::native != network_order) {
    dst = byteswap(dst);
  }
}

struct Payload
{
   char field1;
   std::uint32_t field2;
   std::uint32_t field3;
   char field5;
};

// Read implementation specific to Payload
void read_into(DataSource auto& src, Payload& dst) {
  read_into(src, dst.field1);
  read_into<std::endian::little>(src, dst.field2);
  read_into(src, dst.field3);
  read_into(src, dst.field5);
}

// mind you, nothing stops you from just reading directly into the struct, but beware of endianness issues:
// struct Payload
// {
//    char field1;
//    std::uint32_t field2;
//    std::uint32_t field3;
//    char field5;
// } __attribute__((packed));
// void read_into(DataSource auto& src, Payload& dst) {
//   src.read(reinterpret_cast<char*>(&dst), sizeof(Payload));
// }

// Example
struct some_data_source {
  std::size_t read(char*, std::size_t size);
};

void foo() {
    some_data_source data;

    Payload p;
    read_into(data, p);
}

An alternative API could have been dst.field2 = read<std::uint32_t>(src), which has the drawback of requiring to be explicit about the type, but is more appropriate if you have to deal with non-trivial constructors.

see it in action on godbolt: https://gcc.godbolt.org/z/77rvYE1qn