Home > Net >  Make boost::spirit::symbol parser non greedy
Make boost::spirit::symbol parser non greedy

Time:04-22

I'd like to make a keyword parser that matches i.e. int, but does not match int in integer with eger left over. I use x3::symbols to get automatically get the parsed keyword represented as an enum value.

Minimal example:

#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/utility/error_reporting.hpp>

namespace x3 = boost::spirit::x3;

enum class TypeKeyword { Int, Float, Bool };

struct TypeKeywordSymbolTable : x3::symbols<TypeKeyword> {
    TypeKeywordSymbolTable()
    {
        add("float", TypeKeyword::Float)
           ("int",   TypeKeyword::Int)
           ("bool",  TypeKeyword::Bool);
    }
};
const TypeKeywordSymbolTable type_keyword_symbol_table;

struct TypeKeywordRID {};
using TypeKeywordRule = x3::rule<TypeKeywordRID, TypeKeyword>;

const TypeKeywordRule type_keyword = "type_keyword";
const auto type_keyword_def        = type_keyword_symbol_table;
BOOST_SPIRIT_DEFINE(type_keyword);

using Iterator = std::string_view::const_iterator;

/* Thrown when the parser has failed to parse the whole input stream. Contains
 * the part of the input stream that has not been parsed. */
class LeftoverError : public std::runtime_error {
  public:
    LeftoverError(Iterator begin, Iterator end)
            : std::runtime_error(std::string(begin, end))
    {}

    std::string_view get_leftover_data() const noexcept { return what(); }
};

template<typename Rule>
typename Rule::attribute_type parse(std::string_view input, const Rule& rule)
{
    Iterator begin = input.begin();
    Iterator end   = input.end();

    using ExpectationFailure = boost::spirit::x3::expectation_failure<Iterator>;
    typename Rule::attribute_type result;
    try {
        bool r = x3::phrase_parse(begin, end, rule, x3::space, result);
        if (r && begin == end) {
            return result;
        } else { // Occurs when the whole input stream has not been consumed.
            throw LeftoverError(begin, end);
        }
    } catch (const ExpectationFailure& exc) {
        throw LeftoverError(exc.where(), end);
    }
}

int main()
{
    // TypeKeyword::Bool is parsed and "ean" is leftover, but failed parse with
    // "boolean" leftover is desired.
    parse("boolean", type_keyword);

    // TypeKeyword::Int is parsed and "eger" is leftover, but failed parse with
    // "integer" leftover is desired.
    parse("integer", type_keyword);

    // TypeKeyword::Int is parsed successfully and this is the desired behavior.
    parse("int", type_keyword);
}

Basicly, I want integer not to be recognized as a keyword with additional eger left to parse.

CodePudding user response:

I morphed the test cases into self-describing expectations:

Live On Compiler Explorer

Prints:

FAIL  "boolean"    -> TypeKeyword::Bool    (expected Leftover:"boolean")
FAIL  "integer"    -> TypeKeyword::Int     (expected Leftover:"integer")
OK    "int"        -> TypeKeyword::Int    

Now, the simplest, naive approach would be to make sure you parse till the eoi, by simply changing

auto actual = parse(input, Parser::type_keyword);

To

auto actual = parse(input, Parser::type_keyword >> x3::eoi);

And indeed the tests pass: Live

OK    "boolean"    -> Leftover:"boolean"  
OK    "integer"    -> Leftover:"integer"  
OK    "int"        -> TypeKeyword::Int    

However, this fits the tests, but not the goal. Let's imagine a more involved grammar, where type identifier; is to be parsed:

auto identifier 
    = x3::rule<struct id_, Ast::Identifier> {"identifier"}
    = x3::lexeme[x3::char_("a-zA-Z_") >> *x3::char_("a-zA-Z_0-9")];

auto type_keyword
    = x3::rule<struct tk_, Ast::TypeKeyword> {"type_keyword"}
    = type_;

auto declaration 
    = x3::rule<struct decl_, Ast::Declaration>{"declaration"}
    = type_keyword >> identifier >> ';';

I'll leave the details for Compiler Explorer:

OK    "boolean b;" -> Leftover:"boolean b;"
OK    "integer i;" -> Leftover:"integer i;"
OK    "int j;"     -> (TypeKeyword::Int j)

Looks good. But what if we add some interesting tests:

    {"flo at f;", LeftoverError("flo at f;")},
    {"boolean;", LeftoverError("boolean;")},

It prints (Live)

OK    "boolean b;" -> Leftover:"boolean b;"
OK    "integer i;" -> Leftover:"integer i;"
OK    "int j;"     -> (TypeKeyword::Int j)
FAIL  "boolean;"   -> (TypeKeyword::Bool ean) (expected Leftover:"boolean;")

So, the test cases were lacking. Your prose description is actually closer:

I'd like to make a keyword parser that matches i.e. int, but does not match int in integer with eger left over

That correctly implies you want to check the lexeme inside the type_keyword rule. A naive try might be checking that no identifier character follows the type keyword:

auto type_keyword
    = x3::rule<struct tk_, Ast::TypeKeyword> {"type_keyword"}
    = type_ >> !identchar;

Where identchar was factored out of identifier like so:

auto identchar = x3::char_("a-zA-Z_0-9");

auto identifier 
    = x3::rule<struct id_, Ast::Identifier> {"identifier"}
    = x3::lexeme[x3::char_("a-zA-Z_") >> *identchar];

However, this doesn't work. Can you see why (peeking allowed: https://godbolt.org/z/jb4zfhfWb)?

Our latest devious test case now passes (yay), but int j; is now rejected! If you think about it, it only makes sense, because you have spaced skipped.

The essential word I used a moment ago was lexeme: you want to treat some units as lexemes (and whitespace stops the lexeme. Or rather, whitespace isn't automatically skipped inside the lexeme¹). So, a fix would be:

auto type_keyword
    // ...
    = x3::lexeme[type_ >> !identchar];

(Note how I sneakily already included that on the identifier rule earlier)

Lo and behold (Live):

OK    "boolean b;" -> Leftover:"boolean b;"
OK    "integer i;" -> Leftover:"integer i;"
OK    "int j;"     -> (TypeKeyword::Int j)
OK    "boolean;"   -> Leftover:"boolean;" 

Summarizing

This topic is a frequently recurring one, and it requires a solid understanding of skippers, lexemes first and foremost. Here are some other posts for inspiration:

Good luck!

Complete Listing

Anti-Bitrot, the final listing:

#include <boost/fusion/adapted.hpp>
#include <boost/fusion/include/as_vector.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/lexical_cast.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <iostream>

namespace x3 = boost::spirit::x3;

namespace Ast {
    enum class TypeKeyword { Int, Float, Bool };

    static std::ostream& operator<<(std::ostream& os, TypeKeyword tk) {
        switch (tk) {
            case TypeKeyword::Int:   return os << "TypeKeyword::Int";
            case TypeKeyword::Float: return os << "TypeKeyword::Float";
            case TypeKeyword::Bool:  return os << "TypeKeyword::Bool";
        };
        return os << "?";
    }

    using Identifier = std::string;

    struct Declaration {
        TypeKeyword type;
        Identifier id;

        bool operator==(Declaration const&) const = default;
    };

} // namespace Ast

BOOST_FUSION_ADAPT_STRUCT(Ast::Declaration, type, id)

namespace Ast{
    static std::ostream& operator<<(std::ostream& os, Ast::Declaration const& d) {
        return os << boost::lexical_cast<std::string>(boost::fusion::as_vector(d));
    }
} // namespace Ast

namespace Parser {
    struct Type : x3::symbols<Ast::TypeKeyword> {
        Type() {
            add("float", Ast::TypeKeyword::Float) //
                ("int", Ast::TypeKeyword::Int)    //
                ("bool", Ast::TypeKeyword::Bool); //
        }
    } const static type_;

    auto identchar = x3::char_("a-zA-Z_0-9");

    auto identifier 
        = x3::rule<struct id_, Ast::Identifier> {"identifier"}
        = x3::lexeme[x3::char_("a-zA-Z_") >> *identchar];

    auto type_keyword
        = x3::rule<struct tk_, Ast::TypeKeyword> {"type_keyword"}
        = x3::lexeme[type_ >> !identchar];

    auto declaration 
        = x3::rule<struct decl_, Ast::Declaration>{"declaration"}
        = type_keyword >> identifier >> ';';
} // namespace Parser

struct LeftoverError : std::runtime_error {
    using std::runtime_error::runtime_error;

    friend std::ostream& operator<<(std::ostream& os, LeftoverError const& e) {
        return os << (std::string("Leftover:\"")   e.what()   "\"");
    }
    bool operator==(LeftoverError const& other) const {
        return std::string_view(what()) == other.what();
    }
};

template<typename T>
using Maybe = boost::variant<T, LeftoverError>;

template <typename Rule,
          typename Attr = typename x3::traits::attribute_of<Rule, x3::unused_type>::type,
          typename R = Maybe<Attr>>
R parse(std::string_view input, Rule const& rule) {
    Attr result;
    auto f = input.begin(), l = input.end();
    return x3::phrase_parse(f, l, rule, x3::space, result)
        ? R(std::move(result))
        : LeftoverError({f, l});
}

int main()
{
    using namespace Ast;

    struct {
        std::string_view        input;
        Maybe<Declaration>      expected;
    } cases[] = {
        {"boolean b;", LeftoverError("boolean b;")},
        {"integer i;", LeftoverError("integer i;")},
        {"int j;", Declaration{TypeKeyword::Int, "j"}},
        {"boolean;", LeftoverError("boolean;")},
    };
    for (auto [input, expected] : cases) {
        auto actual = parse(input, Parser::declaration >> x3::eoi);
        bool ok     = expected == actual;

        std::cout << std::left << std::setw(6) << (ok ? "OK" : "FAIL")
                  << std::setw(12) << std::quoted(input) << " -> "
                  << std::setw(20) << actual;
        if (not ok)
            std::cout << " (expected " << expected << ")";
        std::cout << "\n";
    }
}

¹ see Boost spirit skipper issues

  • Related