Home > Mobile >  AST versus synthesized attributes of non-terminal parsers
AST versus synthesized attributes of non-terminal parsers

Time:11-22

I am trying to store the results of a tree parser into a recursive struct using Boost spirit x3.

I got a nice grammar, and a nice ast. But I struggle understanding how they (refuse to) connect.

Here is the AST:


#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/ast/variant.hpp>

#include <boost/fusion/adapted/struct/adapt_struct.hpp>
#include <boost/fusion/include/adapt_struct.hpp>

#include <optional>

namespace ast
{

  struct node
  {
    // are the optional necessary here?
    std::optional<std::vector<node>> children;
    std::optional<std::string> name;
    std::optional<double> length;

  };
}

BOOST_FUSION_ADAPT_STRUCT(ast::node, children, name, length)

Then, this is the grammar that is supposed to synthesize those nodes: (I added few notes of what I believe the non-terminal parsers are synthesizing).

#include <boost/spirit/home/x3.hpp>
#include "ast.h"
#include <boost/fusion/include/vector.hpp>
#include <boost/fusion/include/std_pair.hpp>

  namespace parser
  {
    namespace x3 = boost::spirit::x3;

    // synthesize: it should be a simple ast::node, but the grammar says otherwise???
    x3::rule<class tree, ast::node> const tree{"tree"};
    x3::rule<class branch> branch{"branch"};

    // synthesize: std::string
    auto name     = x3::lexeme[x3::alpha >> *x3::alnum]; // to be improved later
    // synthesize: double
    auto length   = ':' >> x3::double_;
    // synthesize: std::optional<std::string>
    auto leaf     = -name;
    // synthesize: std::pair< std::vector<...>, std::optional<std::string> >
    auto internal = '(' >> (branch % ',') >> ')' >> -name;
    // synthesized attribute: std::variant<node, std::optional<std::string>
    auto subtree  = internal | leaf;
    // synthesize: std::pair<node, std::optional<double>>
    auto const tree_def   = x3::skip(x3::blank)[subtree >> -length >> ';' >> x3::eoi];
    // synthesize: std::pair<node, std::optional<double>>
    auto const branch_def = subtree >> -length;

    BOOST_SPIRIT_DEFINE(branch, tree);
  } // end namespace parser

  // What is that for?
  auto tree()
  {
    return parser::tree;
  }

Despite many attempts to "fix" it, it fails to compile with a /boost/spirit/home/x3/operator/detail/sequence.hpp:143:9: error: static_assert failed due to requirement 'actual_size <= expected_size' "Size of the passed attribute is bigger than expected."

My guess is that the AST is not compatible with the grammar (I don't see anything like a simple tuple<vector<node>, string, double> emerging from the grammar). If so, should I complexify the AST or find a simple way to express the grammar, or would it be a use-case of semantic action?

CodePudding user response:

First off, things are rarely a use-case for semantic actions (Boost Spirit: "Semantic actions are evil"?)¹.

Secondly, my gut says optional is likely unnecessary for the vector/string fields. In my mind, a sequence container is already a generalization of optional (where optional is like container with a maximum size of 1).

Thirdly, as far as I remember there might be limitations with std::optional support. Seems I'm mis-remembering from variant support Transitioning Boost Spirit parser from boost::variant to std::variant.

The Propagation Rules

Now, I'd suggest to keep two things in mind:

  1. Always match the AST declarations as closely to your rule declarations as you can. In particular, your AST doesn't distinguish leaf nodes and internal nodes, which causes confusions because you hope to have both rules "magically" compatible with both.

  2. Be specific about type coercions. x3::rule does this, and you can use it explicitly. One of my recurring x3 devices is as<T>[p] (or sometimes the functional version of it: as<T>(p, name)):

    template <typename T> struct as_type {
        auto operator[](auto p) const {
            struct Tag {};
            return x3::rule<Tag, T>{"as"} = p;
        }
    };
    
    template <typename T> static inline as_type<T> as{};
    

Note that some (most?) of your comment lines

// synthesize: std::string
// synthesize: double
// synthesize: std::optional<std::string>
// synthesize: std::pair< std::vector<...>, std::optional<std::string> >
// synthesized attribute: std::variant<node, std::optional<std::string>
// synthesize: std::pair<node, std::optional<double>>
// synthesize: std::pair<node, std::optional<double>>

are not very accurate. E.g. alpha >> *alnum has fusion::deque<char, std::string> > as attribute type. Mind you, the propagation machinery may be able to iron out the subtle difference when instantiating the x3::move_to to the actual bound reference.

That actually turns out to be the case, see Simplify, Cleanup below

But as is, this ripples down: leaf isn't optional<string> now and so on.

Since you defined branch without attribute type, its attribute type is actually x3::unused_type. This means that subtree doesn't contain a vector<...> either. I had to check, so made a facility:

template <typename ExpectedAttributeType> void check(auto p) {
    static_assert(
        std::is_same_v<
            ExpectedAttributeType,
            typename x3::traits::attribute_of<decltype(p), x3::unused_type>::type>);
};

And now we can see that:

check<std::string>(name); // assuming a fixed `name` rule that coerces std::string
check<std::vector<std::string>>('(' >> name % ',' >> ')');

But for a parser that exposes unused_type:

check<x3::unused_type>(x3::omit[name]);
check<std::vector<x3::unused_type>>('(' >> x3::omit[name] % ',' >> ')'); // FAILS!
check<x3::unused_type>('(' >> x3::omit[name] % ',' >> ')'); // Passes

See what I mean with keeping close control? The ripple-effect makes for wildly different attribute types from what you expected.

In addition, pair<node, optional<double>> would not be compatible with any of your AST types (which happens to be just ast::node of course).

Show, Don't Tell

As I know you to be a fast learner, and I don't know how to best explain the transformations I'd do step by step, I'll just show you the result of applying the above principles to the question code:

x3::rule<class branch, ast::node> node{"node"};

auto name     = as<std::string>[x3::lexeme[x3::alpha >> *x3::alnum]];
auto length   = as<double>[':' >> x3::double_];
auto leaf     = as<std::string>[name | x3::attr(std::string{})];
auto children = '(' >> (node % ',') >> ')' | x3::attr(ast::nodes{});
auto node_def = children >> leaf >> -length;
auto tree     = x3::skip(x3::blank)[node >> ';' >> x3::eoi];

BOOST_SPIRIT_DEFINE(node);

You can your expectations:

void checks() {
    check<double>(length);
    check<std::string>(leaf);
    check<std::string>(name);
    check<ast::nodes>('(' >> node % ',' >> ')'); // Passes
    //check<ast::nodes>(children); // FAILS, actually variant<ast::nodes, ast::nodes> (!!)
    check<ast::nodes>(as<ast::nodes>[children]); // Trivially compatible
    check<ast::node>(tree);
}

Note how subtle the inner mechanisms can be. To be completely fair, Spirit Qi used to be way more predictable and forgiving than X3. I suspect the difference might be in favor of maintainability and compilation times?_

Testing

Borrowing some test cases from Grammar fails parsing tree with Boost Spirit X3 and debug-printing the parsed AST:

Live On Coliru prints

============ running internal tests:
"(,)"    PASS -> node { [node { [], "" }, node { [], "" }], "" }
"(A,B)F"     PASS -> node { [node { [], "A" }, node { [], "B" }], "F" }
"(A:10,B:10)F"   PASS -> node { [node { [], "A":10 }, node { [], "B":10 }], "F" }
============ running tree tests:
";"  PASS -> node { [], "" }
"(,);"   PASS -> node { [node { [], "" }, node { [], "" }], "" }
"(,,(,));"   PASS -> node { [node { [], "" }, node { [], "" }, node { [node { [], "" }, node { [], "" }], "" }], "" }
"(A,B,(C,D));"   PASS -> node { [node { [], "A" }, node { [], "B" }, node { [node { [], "C" }, node { [], "D" }], "" }], "" }
"(A,B,(C,D)E)F;"     PASS -> node { [node { [], "A" }, node { [], "B" }, node { [node { [], "C" }, node { [], "D" }], "E" }], "F" }
"(:0.1,:0.2,(:0.3,:0.4):0.5);"   PASS -> node { [node { [], "":0.1 }, node { [], "":0.2 }, node { [node { [], "":0.3 }, node { [], "":0.4 }], "":0.5 }], "" }
"(:0.1,:0.2,(:0.3,:0.4):0.5):0.0;"   PASS -> node { [node { [], "":0.1 }, node { [], "":0.2 }, node { [node { [], "":0.3 }, node { [], "":0.4 }], "":0.5 }], "":0 }
"(A:0.1,B:0.2,(C:0.3,D:0.4):0.5);"   PASS -> node { [node { [], "A":0.1 }, node { [], "B":0.2 }, node { [node { [], "C":0.3 }, node { [], "D":0.4 }], "":0.5 }], "" }
"(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F;"     PASS -> node { [node { [], "A":0.1 }, node { [], "B":0.2 }, node { [node { [], "C":0.3 }, node { [], "D":0.4 }], "E":0.5 }], "F" }
"((B:0.2,(C:0.3,D:0.4)E:0.5)F:0.1)A;"    PASS -> node { [node { [node { [], "B":0.2 }, node { [node { [], "C":0.3 }, node { [], "D":0.4 }], "E":0.5 }], "F":0.1 }], "A" }

Simplify, Cleanup

After all is done, I usually go in and remove unnecessary rule instantiation.

Turns out, unsurprisingly, that indeed matching the AST to your rules 1:1 makes everything trivially compatible without coercing. In your case I'd probably also remove the optional<double>, so let me show you in case you're interested:

struct node {
    nodes       children;
    std::string name;
    double      length;
};

And the rules become:

x3::rule<class branch, ast::node> node{"node"};

auto name     = x3::lexeme[x3::alpha >> *x3::alnum];
auto length   = ':' >> x3::double_ | x3::attr(0.0);
auto leaf     = name | x3::attr(std::string{});
auto children = '(' >> (node % ',') >> ')' | x3::attr(ast::nodes{});
auto node_def = children >> leaf >> -length;
auto tree     = x3::skip(x3::blank)[node >> ';' >> x3::eoi];

BOOST_SPIRIT_DEFINE(node);

For good measure let's reword the checks to only verify the expected compatibilities:

template <typename ExpectedAttributeType> void compatible(auto p) {
    static_assert(
        std::is_same_v<ExpectedAttributeType,
                       typename x3::traits::attribute_of<
                           decltype(x3::rule<struct _, ExpectedAttributeType>{} = p),
                           x3::unused_type>::type>);
};

void checks() {
    compatible<double>(length);
    compatible<std::string>(leaf);
    compatible<std::string>(name);
    compatible<ast::nodes>(children);
    compatible<ast::node>(tree);
}

Full Listing

Live On Coliru

#include <boost/fusion/include/adapted.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <iostream>

namespace ast {
    using nodes = std::vector<struct node>;
    struct node {
        nodes       children;
        std::string name;
        double      length;
    };

    // for debug output
    static inline std::ostream& operator<<(std::ostream& os, node const& n) {
        os << "node { [";

        for (auto sep = ""; auto& c : n.children)
            os << std::exchange(sep, ", ") << c;

        os << "], " << quoted(n.name) ;

        if (n.length != 0)
            os << ":" << n.length;
        return os << " }";
    }
} // namespace ast

BOOST_FUSION_ADAPT_STRUCT(ast::node, children, name, length)

#include <boost/fusion/include/vector.hpp>
#include <boost/fusion/include/std_pair.hpp>

namespace parser {
    namespace x3 = boost::spirit::x3;

    static x3::rule<class branch, ast::node> const node{"node"};

    static auto const name     = x3::lexeme[x3::alpha >> *x3::alnum];
    static auto const length   = ':' >> x3::double_ | x3::attr(0.0);
    static auto const leaf     = name | x3::attr(std::string{});
    static auto const children = '(' >> (node % ',') >> ')' | x3::attr(ast::nodes{});
    static auto const node_def = children >> leaf >> -length;
    static auto const tree     = x3::skip(x3::blank)[node >> ';' >> x3::eoi];

    BOOST_SPIRIT_DEFINE(node);

    template <typename ExpectedAttributeType> void compatible(auto p) {
        static_assert(
            std::is_same_v<ExpectedAttributeType,
                           typename x3::traits::attribute_of<
                               decltype(x3::rule<struct _, ExpectedAttributeType>{} = p),
                               x3::unused_type>::type>);
    };

    void checks() {
        compatible<double>(length);
        compatible<std::string>(leaf);
        compatible<std::string>(name);
        compatible<ast::nodes>(children);
        compatible<ast::node>(tree);
    }
} // namespace parser

namespace test {
    void run_tests(auto name, auto p, std::initializer_list<char const*> cases) {
        std::cerr << "============ running " << name << " tests:\n";
        for (std::string const input : cases)
        {
            ast::node n;
            auto      ok = parse(begin(input), end(input), p, n);

            std::cout << quoted(input) << "\t " << (ok ? "PASS" : "FAIL");
            if (ok)
                std::cout << " -> " << n << std::endl;
            else
                std::cout << std::endl;
        }
    }

    void internal() {
        run_tests("internal", parser::node,
                  {
                      "(,)",
                      "(A,B)F",
                      "(A:10,B:10)F",
                  });
    }

    void tree() {
        run_tests("tree", parser::tree,
            {
                ";",
                "(,);",
                "(,,(,));",
                "(A,B,(C,D));",
                "(A,B,(C,D)E)F;",
                "(:0.1,:0.2,(:0.3,:0.4):0.5);",
                "(:0.1,:0.2,(:0.3,:0.4):0.5):0.0;",
                "(A:0.1,B:0.2,(C:0.3,D:0.4):0.5);",
                "(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F;",
                "((B:0.2,(C:0.3,D:0.4)E:0.5)F:0.1)A;",
            });
    }
} // namespace test

int main() {
    test::internal();
    test::tree();
}

With identical output (except for the suppressed 0.0 length in the debug output).


¹ In fairness in X3 I'll more happily consider it as writing them has become a lot more natural.

  • Related