I'm trying to write a parser using boost::spirit::qi which will parse everything between a pair of "
as-is, and allowing escaping of "
characters. I.E., "ab\n\""
should return ab\n\"
. I've tried with the following code (godbolt link):
#include <boost/spirit/include/qi.hpp>
#include <string>
namespace qi = boost::spirit::qi;
int main() {
std::string input{R"("ab\n\"")"};
std::cout << "[" << input << "]\n";
std::string output;
using Skipper = qi::rule<std::string::const_iterator>;
Skipper skip = qi::space;
qi::rule<std::string::const_iterator, std::string(), Skipper> qstring;
qstring %= qi::lit("\"")
> ( *( (qi::print - qi::lit('"') - qi::lit("\\")) | (qi::char_("\\") > qi::print) ) )
// ^^^^^
> qi::lit("\"");
auto success = qi::phrase_parse(input.cbegin(), input.cend(), qstring, skip, output);
if (!success) {
std::cout << "Failed to parse";
return 1;
}
std::cout << "output = [" << output << "]\n";
return 0;
}
This fails to compile based on some template errors,
/opt/compiler-explorer/libs/boost_1_81_0/boost/spirit/home/support/container.hpp:130:12: error: 'char' is not a class, struct, or union type
130 | struct container_value
| ^~~~~~~~~~~~~~~
.....
/opt/compiler-explorer/libs/boost_1_81_0/boost/spirit/home/qi/detail/pass_container.hpp:320:66: error: no type named 'type' in 'struct boost::spirit::traits::container_value<char, void>'
320 | typedef typename traits::container_value<Attr>::type value_type;
I can get the code to compile if I change the underlined qi::char_("\\")
with qi::lit("\\")
, but that doesn't create an attribute for the \
which it matches. I've also found that I can get it to compile if I create a new rule which embodies just the Kleene star, but is there a way to get boost to use the correct types in a single expression?
qi::rule<std::string::const_iterator, std::string(), Skipper> qstring;
qi::rule<std::string::const_iterator, std::string(), Skipper> qstringbody;
qstringbody %= ( *( (qi::print - qi::lit('"') - qi::lit("\\")) | (qi::char_("\\") > qi::print) ) );
qstring %= qi::lit("\"")
> qstringbody
> qi::lit("\"");
CodePudding user response:
qi::char_("\") with qi::lit("\"), but that doesn't create an attribute for the \ which it matches
This is what you require. Parsing should translate the input representation (syntaxis) into your meaningful representation (semantics). It is possible to have an AST that reflects escapes, of course, but then you would NOT be parsing into a string, but something like
struct char_or_escape {
enum { hex_escape, octal_escape, C_style_char_esc, unicode_codepoint_escape, named_unicode_escape } type;
std::variant<uint32_t, std::string> value;
};
using StringAST = std::vector<char_or_escape>;
Presumably, you don't want to keep the raw input (otherwise, qi::raw[]
is your friend).
Applying It
Here's my simplification
qi::rule<It, std::string(), Skipper> qstring //
= '"' > *(qi::print - '"' - "\\" | "\\" > qi::print) > '"';
Side note: It seems to require printables only. I'll remove that assumption in the following. You can, of course, reintroduce character subsets as you require.
qstring = '"' > *(~qi::char_("\"\\") | '\\' > qi::char_) > '"';
Reordering the branches removes the need to except '\\'
, while being more expressive about intent:
qstring = '"' > *('\\' > qi::char_ | ~qi::char_('"')) > '"';
Now, from the example input I gather that you might require a C-style treatment of escapes. May I suggest:
qi::symbols<char, char> c_esc;
c_esc.add("\\\\", '\\') //
("\\a", '\a')("\\b", '\b')("\\n", '\n')("\\f", '\f')("\\t", '\t')("\\r", '\r') //
("\\v", '\v')("\\0", '\0')("\\e", 0x1b)("\\'", '\'')("\\\"", '"')("\\?", 0x3f);
qstring = '"' > *(c_esc | '\\' >> qi::char_ | ~qi::char_('"')) > '"';
(Note some of these are redundant because they already encode into the secondary input character).
Demo
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
namespace qi = boost::spirit::qi;
int main() {
using It = std::string::const_iterator;
using Skipper = qi::space_type;
qi::rule<It, std::string(), Skipper> qstring;
qi::symbols<char, char> c_esc;
c_esc.add("\\\\", '\\') //
("\\a", '\a')("\\b", '\b')("\\n", '\n')("\\f", '\f')("\\t", '\t')("\\r", '\r') //
("\\v", '\v')("\\0", '\0')("\\e", 0x1b)("\\'", '\'')("\\\"", '"')("\\?", 0x3f);
qstring = '"' > *(c_esc | '\\' >> qi::char_ | ~qi::char_('"')) > '"';
for (std::string input :
{
R"("")",
R"("ab\n\"")",
R"("ab\r\n\'")",
}) //
{
std::string output;
bool success = phrase_parse(input.cbegin(), input.cend(), qstring, qi::space, output);
if (!success)
std::cout << quoted(input) << " -> FAILED\n";
else
std::cout << quoted(input) << " -> " << quoted(output) << "\n";
}
}
Printing
"\"\"" -> ""
"\"ab\\n\\\"\"" -> "ab
\""
"\"ab\\r\\n\\'\"" -> "ab
'"
Further Reading
For more complete escape handling, see here: Creating a boost::spirit::x3 parser for quoted strings with escape sequence handling (also alternative approaches instead of the symbols).
It contains a list of even more elaborate examples (JSON style unicode escapes etc.)