I'd like to make a keyword parser that matches i.e. int
, but does not match int
in integer
with eger
left over. I use x3::symbols
to get automatically get the parsed keyword represented as an enum value.
Minimal example:
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/utility/error_reporting.hpp>
namespace x3 = boost::spirit::x3;
enum class TypeKeyword { Int, Float, Bool };
struct TypeKeywordSymbolTable : x3::symbols<TypeKeyword> {
TypeKeywordSymbolTable()
{
add("float", TypeKeyword::Float)
("int", TypeKeyword::Int)
("bool", TypeKeyword::Bool);
}
};
const TypeKeywordSymbolTable type_keyword_symbol_table;
struct TypeKeywordRID {};
using TypeKeywordRule = x3::rule<TypeKeywordRID, TypeKeyword>;
const TypeKeywordRule type_keyword = "type_keyword";
const auto type_keyword_def = type_keyword_symbol_table;
BOOST_SPIRIT_DEFINE(type_keyword);
using Iterator = std::string_view::const_iterator;
/* Thrown when the parser has failed to parse the whole input stream. Contains
* the part of the input stream that has not been parsed. */
class LeftoverError : public std::runtime_error {
public:
LeftoverError(Iterator begin, Iterator end)
: std::runtime_error(std::string(begin, end))
{}
std::string_view get_leftover_data() const noexcept { return what(); }
};
template<typename Rule>
typename Rule::attribute_type parse(std::string_view input, const Rule& rule)
{
Iterator begin = input.begin();
Iterator end = input.end();
using ExpectationFailure = boost::spirit::x3::expectation_failure<Iterator>;
typename Rule::attribute_type result;
try {
bool r = x3::phrase_parse(begin, end, rule, x3::space, result);
if (r && begin == end) {
return result;
} else { // Occurs when the whole input stream has not been consumed.
throw LeftoverError(begin, end);
}
} catch (const ExpectationFailure& exc) {
throw LeftoverError(exc.where(), end);
}
}
int main()
{
// TypeKeyword::Bool is parsed and "ean" is leftover, but failed parse with
// "boolean" leftover is desired.
parse("boolean", type_keyword);
// TypeKeyword::Int is parsed and "eger" is leftover, but failed parse with
// "integer" leftover is desired.
parse("integer", type_keyword);
// TypeKeyword::Int is parsed successfully and this is the desired behavior.
parse("int", type_keyword);
}
Basicly, I want integer
not to be recognized as a keyword with additional eger
left to parse.
CodePudding user response:
I morphed the test cases into self-describing expectations:
Prints:
FAIL "boolean" -> TypeKeyword::Bool (expected Leftover:"boolean")
FAIL "integer" -> TypeKeyword::Int (expected Leftover:"integer")
OK "int" -> TypeKeyword::Int
Now, the simplest, naive approach would be to make sure you parse till the eoi
, by simply changing
auto actual = parse(input, Parser::type_keyword);
To
auto actual = parse(input, Parser::type_keyword >> x3::eoi);
And indeed the tests pass: Live
OK "boolean" -> Leftover:"boolean"
OK "integer" -> Leftover:"integer"
OK "int" -> TypeKeyword::Int
However, this fits the tests, but not the goal. Let's imagine a more involved grammar, where type identifier;
is to be parsed:
auto identifier
= x3::rule<struct id_, Ast::Identifier> {"identifier"}
= x3::lexeme[x3::char_("a-zA-Z_") >> *x3::char_("a-zA-Z_0-9")];
auto type_keyword
= x3::rule<struct tk_, Ast::TypeKeyword> {"type_keyword"}
= type_;
auto declaration
= x3::rule<struct decl_, Ast::Declaration>{"declaration"}
= type_keyword >> identifier >> ';';
I'll leave the details for Compiler Explorer:
OK "boolean b;" -> Leftover:"boolean b;"
OK "integer i;" -> Leftover:"integer i;"
OK "int j;" -> (TypeKeyword::Int j)
Looks good. But what if we add some interesting tests:
{"flo at f;", LeftoverError("flo at f;")},
{"boolean;", LeftoverError("boolean;")},
It prints (Live)
OK "boolean b;" -> Leftover:"boolean b;"
OK "integer i;" -> Leftover:"integer i;"
OK "int j;" -> (TypeKeyword::Int j)
FAIL "boolean;" -> (TypeKeyword::Bool ean) (expected Leftover:"boolean;")
So, the test cases were lacking. Your prose description is actually closer:
I'd like to make a keyword parser that matches i.e. int, but does not match int in integer with eger left over
That correctly implies you want to check the lexeme inside the type_keyword
rule. A naive try might be checking that no identifier character follows the type keyword:
auto type_keyword
= x3::rule<struct tk_, Ast::TypeKeyword> {"type_keyword"}
= type_ >> !identchar;
Where identchar
was factored out of identifier
like so:
auto identchar = x3::char_("a-zA-Z_0-9");
auto identifier
= x3::rule<struct id_, Ast::Identifier> {"identifier"}
= x3::lexeme[x3::char_("a-zA-Z_") >> *identchar];
However, this doesn't work. Can you see why (peeking allowed: https://godbolt.org/z/jb4zfhfWb)?
Our latest devious test case now passes (yay), but int j;
is now rejected! If you think about it, it only makes sense, because you have spaced skipped.
The essential word I used a moment ago was lexeme: you want to treat some units as lexemes (and whitespace stops the lexeme. Or rather, whitespace isn't automatically skipped inside the lexeme¹). So, a fix would be:
auto type_keyword
// ...
= x3::lexeme[type_ >> !identchar];
(Note how I sneakily already included that on the
identifier
rule earlier)
Lo and behold (Live):
OK "boolean b;" -> Leftover:"boolean b;"
OK "integer i;" -> Leftover:"integer i;"
OK "int j;" -> (TypeKeyword::Int j)
OK "boolean;" -> Leftover:"boolean;"
Summarizing
This topic is a frequently recurring one, and it requires a solid understanding of skippers, lexemes first and foremost. Here are some other posts for inspiration:
Boost Spirit x3: parse delimited string Where I introduce a more general helper you might find useful:
auto kw = [](auto p) { return x3::lexeme [ x3::as_parser(p) >> !x3::char_("a-zA-Z0-9_") ]; };
Good luck!
Complete Listing
Anti-Bitrot, the final listing:
#include <boost/fusion/adapted.hpp>
#include <boost/fusion/include/as_vector.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/lexical_cast.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <iostream>
namespace x3 = boost::spirit::x3;
namespace Ast {
enum class TypeKeyword { Int, Float, Bool };
static std::ostream& operator<<(std::ostream& os, TypeKeyword tk) {
switch (tk) {
case TypeKeyword::Int: return os << "TypeKeyword::Int";
case TypeKeyword::Float: return os << "TypeKeyword::Float";
case TypeKeyword::Bool: return os << "TypeKeyword::Bool";
};
return os << "?";
}
using Identifier = std::string;
struct Declaration {
TypeKeyword type;
Identifier id;
bool operator==(Declaration const&) const = default;
};
} // namespace Ast
BOOST_FUSION_ADAPT_STRUCT(Ast::Declaration, type, id)
namespace Ast{
static std::ostream& operator<<(std::ostream& os, Ast::Declaration const& d) {
return os << boost::lexical_cast<std::string>(boost::fusion::as_vector(d));
}
} // namespace Ast
namespace Parser {
struct Type : x3::symbols<Ast::TypeKeyword> {
Type() {
add("float", Ast::TypeKeyword::Float) //
("int", Ast::TypeKeyword::Int) //
("bool", Ast::TypeKeyword::Bool); //
}
} const static type_;
auto identchar = x3::char_("a-zA-Z_0-9");
auto identifier
= x3::rule<struct id_, Ast::Identifier> {"identifier"}
= x3::lexeme[x3::char_("a-zA-Z_") >> *identchar];
auto type_keyword
= x3::rule<struct tk_, Ast::TypeKeyword> {"type_keyword"}
= x3::lexeme[type_ >> !identchar];
auto declaration
= x3::rule<struct decl_, Ast::Declaration>{"declaration"}
= type_keyword >> identifier >> ';';
} // namespace Parser
struct LeftoverError : std::runtime_error {
using std::runtime_error::runtime_error;
friend std::ostream& operator<<(std::ostream& os, LeftoverError const& e) {
return os << (std::string("Leftover:\"") e.what() "\"");
}
bool operator==(LeftoverError const& other) const {
return std::string_view(what()) == other.what();
}
};
template<typename T>
using Maybe = boost::variant<T, LeftoverError>;
template <typename Rule,
typename Attr = typename x3::traits::attribute_of<Rule, x3::unused_type>::type,
typename R = Maybe<Attr>>
R parse(std::string_view input, Rule const& rule) {
Attr result;
auto f = input.begin(), l = input.end();
return x3::phrase_parse(f, l, rule, x3::space, result)
? R(std::move(result))
: LeftoverError({f, l});
}
int main()
{
using namespace Ast;
struct {
std::string_view input;
Maybe<Declaration> expected;
} cases[] = {
{"boolean b;", LeftoverError("boolean b;")},
{"integer i;", LeftoverError("integer i;")},
{"int j;", Declaration{TypeKeyword::Int, "j"}},
{"boolean;", LeftoverError("boolean;")},
};
for (auto [input, expected] : cases) {
auto actual = parse(input, Parser::declaration >> x3::eoi);
bool ok = expected == actual;
std::cout << std::left << std::setw(6) << (ok ? "OK" : "FAIL")
<< std::setw(12) << std::quoted(input) << " -> "
<< std::setw(20) << actual;
if (not ok)
std::cout << " (expected " << expected << ")";
std::cout << "\n";
}
}