Home > Software engineering >  boost x3 grammar for structs with multiple constructors
boost x3 grammar for structs with multiple constructors

Time:01-08

Trying to figure out how to parse structs that have multiple constructors or overloaded constructors. For example in this case, a range struct that contains either a range or a singleton case where the start/end of the range is equal.

case 1: look like

"start-stop"

case 2:

"start"

For the range case

auto range_constraint = x3::rule<struct test_struct, MyRange>{} = (x3::int_ >> x3::lit("-") >> x3::int_);

works but

auto range_constraint = x3::rule<struct test_struct, MyRange>{} = x3::int_ | (x3::int_ >> x3::lit("-") >> x3::int_);

unsurprisingly, won't match the signature and fails to compile.

Not sure what the fix is?

#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iostream>
namespace x3 = boost::spirit::x3;
struct MyRange
{
    size_t start;
    size_t end;
    // little bit weird because should be end 1, but w/e
    explicit MyRange(size_t start, size_t end = 0) : start(start), end(end == 0 ? start : end)
    {
    }
};
BOOST_FUSION_ADAPT_STRUCT(MyRange, start, end)
// BOOST_FUSION_ADAPT_STRUCT(MyRange, start)
//

int main()
{
 
    auto range_constraint = x3::rule<struct test_struct, MyRange>{} = (x3::int_ >> x3::lit("-") >> x3::int_);
    // auto range_constraint = x3::rule<struct test_struct, MyRange>{} = x3::int_ | (x3::int_ >> x3::lit("-") >> x3::int_);

    for (std::string input :
         {"1-2", "1","1-" ,"garbage"})
    {
                auto success = x3::phrase_parse(input.begin(), input.end(),
                                        // Begin grammar
                                        range_constraint,
                                        // End grammar
                                        x3::ascii::space);
        std::cout << "`" << input << "`"
                  << "-> " << success<<std::endl;
    }
    return 0;
}

CodePudding user response:

It's important to realize that sequence adaptation by definition uses default construction with subsequent sequence element assignment.

Another issue is branch ordering in PEG grammars. int_ will always success where int_ >> '‑' >> int_ would so you would never match the range version.

Finally, to parse size_t usually prefer uint_/uint_parser<size_t> :)

Things That Don't Work

There are several ways to skin this cat. For one, there's BOOST_FUSION_ADAPT_STRUCT_NAMED, which would allow you to do

BOOST_FUSION_ADAPT_STRUCT_NAMED(MyRange, Range, start, end)
BOOST_FUSION_ADAPT_STRUCT_NAMED(MyRange, SingletonRange, start)

So one pretty elaborate would seem to spell it out:

auto range     = x3::rule<struct _, Range>{}          = uint_ >> '-' >> uint_;
auto singleton = x3::rule<struct _, SingletonRange>{} = uint_;
auto rule      = x3::rule<struct _, MyRange>{}        = range | singleton;

TIL that this doesn't even compile, apparently Qi was differently: Live On Coliru

X3 requires the attribute to be default-constructible whereas Qi would attempt to bind to the passed-in attribute reference first.

Even in the Qi version you can see that the fact Fusion sequences will be default-contructed-then-memberwise-assigned leads to results you didn't expect or want:

`1-2` -> true
 -- [1,NIL)
`1` -> true
 -- [1,NIL)
`1-` -> true
 -- [1,NIL)
`garbage` -> false

What Works

Instead of doing the complicated things, do the simple thing. Anytime you see an optional value you can usually provide a default value. Alternatively you can not use Sequence adaptation at all, and go straight to semantic actions.

Semantic Actions

The simplest way would be to have specific branches:

auto assign1 = [](auto& ctx) { _val(ctx) = MyRange(_attr(ctx)); };
auto assign2 = [](auto& ctx) { _val(ctx) = MyRange(at_c<0>(_attr(ctx)), at_c<1>(_attr(ctx))); };

auto rule = x3::rule<void, MyRange>{} =
    (uint_ >> '-' >> uint_)[assign2] | uint_[assign1];

Slighty more advanced, but more efficient:

auto assign1 = [](auto& ctx) { _val(ctx) = MyRange(_attr(ctx)); };
auto assign2 = [](auto& ctx) { _val(ctx) = MyRange(_val(ctx).start, _attr(ctx)); };

auto rule = x3::rule<void, MyRange>{} = uint_[assign1] >> -('-' >> uint_[assign2]);

Lastly, we can move towards defaulting the optional end:

auto rule = x3::rule<void, MyRange>{} =
    (uint_ >> ('-' >> uint_ | x3::attr(MyRange::unspecified))) //
        [assign];

Now the semantic action will have to deal with the variant end type:

auto assign = [](auto& ctx) {
    auto start = at_c<0>(_attr(ctx));
    _val(ctx)  = apply_visitor(                         //
        [=](auto end) { return MyRange(start, end); }, //
        at_c<1>(_attr(ctx)));
};

Also Live On Coliru

Simplify?

I'd consider modeling the range explicitly as having an optional end:

struct MyRange {
    MyRange() = default;
    MyRange(size_t s, boost::optional<size_t> e = {}) : start(s), end(e) {
        assert(!e || *e >= s);
    }

    size_t size() const  { return end? *end - start : 1; }
    bool   empty() const { return size() == 0; }

    size_t                  start = 0;
    boost::optional<size_t> end   = 0;
};

Now you can directly use the optional to construct:

auto assign = [](auto& ctx) {
    _val(ctx) = MyRange(at_c<0>(_attr(ctx)), at_c<1>(_attr(ctx)));
};

auto rule = x3::rule<void, MyRange>{} = (uint_ >> -('-' >> uint_))[assign];

Actually, here we can go back to using adapted sequences, although with different semantics:

Live On Coliru

#include <boost/fusion/adapted.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <iostream>
namespace x3 = boost::spirit::x3;

struct MyRange {
    size_t                  start = 0;
    boost::optional<size_t> end   = 0;
};

static inline std::ostream& operator<<(std::ostream& os, MyRange const& mr) {
    if (mr.end)
        return os << "[" << mr.start << "," << *mr.end << ")";
    else
        return os << "[" << mr.start << ",)";
}

BOOST_FUSION_ADAPT_STRUCT(MyRange, start, end)

int main() {
    x3::uint_parser<size_t> uint_;
    auto rule = x3::rule<void, MyRange>{} = uint_ >> -('-' >> uint_);

    for (std::string const input : {"1-2", "1", "1-", "garbage"}) {
        MyRange into;
        auto    success = phrase_parse(input.begin(), input.end(), rule, x3::space, into);
        std::cout << quoted(input, '`') << " -> " << std::boolalpha << success
                  << std::endl;

        if (success) {
            std::cout << " -- " << into << "\n";
        }
    }
}

Summarizing

I hope these strategies give you all the things you needed. Pay close attention to the semantics of your range. Specifically, I never payed any attention to difference between "1" and "1-". You might want one to be [1,2) and the other to be [1,inf), both to be equivalent, or the second one might even be considered invalid?

Stepping back even further, I'd suggest that maybe you just needed

using Bound   = std::optional<size_t>;
using MyRange = std::pair<Bound, Bound>;

Which you could parse directly with:

auto boundary = -x3::uint_parser<size_t>{};
auto rule = x3::rule<void, MyRange>{} = boundary >> '-' >> boundary;

It would allow for more inputs:

for (std::string const input : {"-2", "1-2", "1", "1-", "garbage"}) {
    MyRange into;
    auto    success = phrase_parse(input.begin(), input.end(), rule, x3::space, into);
    std::cout << quoted(input, '`') << " -> " << std::boolalpha << success
              << std::endl;

    if (success) {
        std::cout << " -- " << into << "\n";
    }
}

Prints: Live On Coliru

`-2` -> true
 -- [,2)
`1-2` -> true
 -- [1,2)
`1` -> false
`1-` -> true
 -- [1,)
`garbage` -> false
  • Related