Inspired from sehe's answer at
Boost spirit x3 - lazy parser
I tried to adapt it to one of my own problem (which is another story).
My grammar to implement has several ways to express numerical literals with
bases of 2, 8, 10 and 16. I've reduced the approach mentioned above hopefully
to a bearable minimum.
At AST I like to preserve the numerical presentation (integer, fractional, exp parts)
as boost::iterator_range<> by use of x3::raw
to evaluate it later, only base shall be of
integer type. Honesty, I haven't the requirements for the future yet (I could imagine
several possibilities - even evaluate it to a real/integer by the parser, but most of
the time, the reality looks different.). For simplicity, I've used here std::string
here
struct number {
unsigned base;
std::string literal;
};
Since the base and numbers can have underscores embedded, I've used range-v3
's
views::filter()
function. Another approach to handle those separated number has
sehe shown at X3 parse rule doesn't compile.
The core idea is to have (I've used Qi's Nabialek trick long time ago) something like
auto const char_set = [](auto&& char_range, char const* name) {
return x3::rule<struct _, std::string>{ name } = x3::as_parser(
x3::raw[ x3::char_(char_range) >> *(-lit("_") >> x3::char_(char_range)) ]);
};
auto const bin_charset = char_set("01", "binary charset");
auto const oct_charset = char_set("0-7", "octal charset");
auto const dec_charset = char_set("0-9", "decimal charset");
auto const hex_charset = char_set("0-9a-fA-F", "hexadecimal charset");
using Value = ast::number;
using It = std::string::const_iterator;
using Rule = x3::any_parser<It, Value>;
x3::symbols<Rule> const based_parser({
{ 2, as<std::string>[ bin_charset ] },
{ 8, as<std::string>[ oct_charset ] },
{ 10, as<std::string>[ dec_charset ] },
{ 16, as<std::string>[ hex_charset ] }
}, "based character set"
);
auto const base = x3::rule<struct _, unsigned>{ "base" } = dec_charset; // simplified
auto const parser = x3::with<Rule>(Rule{}) [
x3::lexeme[ set_lazy<Rule>[based_parser] >> '#' >> do_lazy<Rule> ]
];
auto const grammar = x3::skip[ x3::space ]( parser >> x3:: eoi );
and use them like
for (std::string const input : {
"2#0101",
"8#42",
"10#4711",
"1_6#DEAD_BEEF",
})
{
...
}
Well, it doesn't compile and hence I do not know if it would work this way. I think, it's
a better way than several lines of alternatives (as my old code). Further, if I study newer
standards of the grammar I like to implement, the syntax has been extended with leading
integer (for numeric width) and other base specifier, e.g. 'UB', 'UO' and others. This
would come off-topic: How can I prepare the code for further grammar extensions (using something like eps[get<std_tag>(ctx) == x42]
)?
For convenience, I've put the example at coliru.
CodePudding user response:
Well, it doesn't compile and hence I do not know if it would work this way.
Where to start. Let me recommend: Baby steps. X3 is not the framework to throw together a bunch of code and expect it to just compile let alone do what you want.
Some notes:
symbols
key needs to be a character sequence, not any integer value- the rule type synthesizes a
Value
(as you declaredRule = any_parser<It, Value>
). However, you "coerce" those the symbol expressionsstd::string
usingas<std::string>
. That is not compatible. - if you want to also store the matched symbol, perhaps use
&sym >> x3::uint_ >> '#'
to handle it
Let me combine the factories:
template<typename...> struct Tag { };
template<typename T, typename P>
auto
as(P p, char const* name = "as")
{
return x3::rule<Tag<T, P>, T>{name} = x3::as_parser(p);
}
Now you can simply write
auto const delimit_numeric_digits = [](auto&& char_range, char const* name)
{
auto cs = x3::char_(char_range);
return as<std::string>(x3::raw[cs >> *('_' >> cs | cs)], name);
};
auto const bin_digits = delimit_numeric_digits("01", "binary digits");
auto const oct_digits = delimit_numeric_digits("0-7", "octal digits");
auto const dec_digits = delimit_numeric_digits("0-9", "decimal digits");
auto const hex_digits = delimit_numeric_digits("0-9a-fA-F", "hexadecimal digits");
(See how I improved on the naming, since charset
really didn't cover it).
Next, fixing the symbol lookup:
using Rule = x3::any_parser<It, std::string>;
x3::symbols<Rule> const based_parser({
{"2#", bin_digits},
{"8#", oct_digits},
{"10#", dec_digits},
{"16#", hex_digits},
});
Notably, the digits only synthesize std::string
, not the base. Now, use the trick outlined above to still expose the base as integer:
auto const parser //
= x3::rule<struct _, Value, true>{"Value"} //
= x3::with<Rule>(Rule{})[ //
x3::lexeme
[&set_lazy<Rule>[based_parser] >> x3::uint_ >> '#' >> do_lazy<Rule>]];
Live Demo
//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted/struct.hpp>
#include <iostream>
#include <iomanip>
namespace x3 = boost::spirit::x3;
namespace ast {
struct number {
unsigned base;
std::string literal;
};
}
BOOST_FUSION_ADAPT_STRUCT(ast::number, base, literal)
std::ostream&
operator<<(std::ostream& os, ast::number const n)
{
return os << n.base << '#' << n.literal;
}
namespace Parsing {
template<typename...> struct Tag { };
template<typename T, typename P>
auto
as(P p, char const* name = "as")
{
return x3::rule<Tag<T, P>, T>{name} = x3::as_parser(p);
}
template<typename Tag>
struct set_lazy_type
{
template<typename P>
auto
operator[](P p) const
{
auto action = [](auto& ctx) { // set rhs parser
x3::get<Tag>(ctx) = x3::_attr(ctx);
};
return p[action];
}
};
template<typename Tag>
struct do_lazy_type : x3::parser<do_lazy_type<Tag>>
{
using attribute_type = typename Tag::attribute_type; // TODO FIXME?
template<typename It, typename Ctx, typename RCtx, typename Attr>
bool
parse(It& first, It last, Ctx& ctx, RCtx& rctx, Attr& attr) const
{
auto& subject = x3::get<Tag>(ctx);
It saved = first;
x3::skip_over(first, last, ctx);
if(x3::as_parser(subject).parse(
first,
last,
std::forward<Ctx>(ctx),
std::forward<RCtx>(rctx),
attr))
{
return true;
} else
{
first = saved;
return false;
}
}
};
template<typename T> static const set_lazy_type<T> set_lazy{};
template<typename T> static const do_lazy_type<T> do_lazy{};
auto const delimit_numeric_digits = [](auto&& char_range, char const* name)
{
auto cs = x3::char_(char_range);
return as<std::string>(x3::raw[cs >> *('_' >> cs | cs)], name);
};
auto const bin_digits = delimit_numeric_digits("01", "binary digits");
auto const oct_digits = delimit_numeric_digits("0-7", "octal digits");
auto const dec_digits = delimit_numeric_digits("0-9", "decimal digits");
auto const hex_digits = delimit_numeric_digits("0-9a-fA-F", "hexadecimal digits");
using Value = ast::number;
using It = std::string::const_iterator;
using Rule = x3::any_parser<It, std::string>;
x3::symbols<Rule> const based_parser({
{"2#", bin_digits},
{"8#", oct_digits},
{"10#", dec_digits},
{"16#", hex_digits},
});
auto const parser //
= x3::rule<struct _, Value, true>{"Value"} //
= x3::with<Rule>(Rule{})[ //
x3::lexeme
[&set_lazy<Rule>[based_parser] >> x3::uint_ >> '#' >> do_lazy<Rule>]];
auto const grammar = x3::skip(x3::space)[parser >> x3::eoi];
} // namespace Parsing
int main()
{
for(std::string const input : {
"2#0101",
"8#42",
"10#4711",
"1_6#DEAD_BEEF",
})
{
Parsing::Value attr;
if(parse(begin(input), end(input), Parsing::grammar, attr))
{
std::cout << std::quoted(input) << " -> success (" << attr << ")\n";
} else
{
std::cout << std::quoted(input) << " -> failed\n";
}
}
}
Prints
"2#0101" -> success (2#0101)
"8#42" -> success (8#42)
"10#4711" -> success (10#4711)
"1_6#DEAD_BEEF" -> failed