Home > Net >  Boost spirit alternative operator doesn't fill all attribute values
Boost spirit alternative operator doesn't fill all attribute values

Time:10-08

I read real numbers from a file, using boost spirit qi. I try to implement conditional parser, where input depends on the first character on the line.

#include <iostream>
#include <boost/fusion/adapted/struct/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
using namespace std;
namespace qi = boost::spirit::qi;


struct  MyStruct {

   double r1, r2, r3, r4;
   double r5, r6, r7, r8;
};

BOOST_FUSION_ADAPT_STRUCT(
   MyStruct,

   (double, r1), (double, r2), (double, r3), (double, r4),
   (double, r5), (double, r6), (double, r7), (double, r8)
);

int main(int argc, wchar_t* argv[])
{
   string test =
      "A 1.000000000000e 00 2.000000000000e 00 3.000000000000e 00 4.000000000000e 00\r\n"
      "B 5.000000000000e 00 6.000000000000e 00 7.000000000000e 00 8.000000000000e 00\r\n";
   qi::rule<string::const_iterator> CRLF = qi::copy(qi::lit("\r\n"));
   qi::real_parser d19_12;

   MyStruct ms;
   qi::rule<string::const_iterator, MyStruct()> gr =

      qi::lit("A") >> d19_12 >> d19_12 >> d19_12 >> d19_12 >> CRLF
      >> (
         (qi::lit('B') >> d19_12  >> d19_12 >> d19_12 >> d19_12 >> CRLF)
         |
         (qi::lit('C') >> d19_12  >> d19_12 >> d19_12 >>  qi::lit('_') >> qi::attr(0.0) >> CRLF)
         )
      ;
   string::const_iterator f = test.cbegin();
   string::const_iterator e = test.cend();
   bool ret = qi::parse(f, e, gr, ms);

   return ret;
}

everything works as expected without 'C' alternative, but adding this alternative makes the parser skip the values, the result is

  •   ms  MyStruct
      r1  1.0000000000000000  double
      r2  2.0000000000000000  double
      r3  3.0000000000000000  double
      r4  4.0000000000000000  double
      r5  5.0000000000000000  double
      r6  -9.2559631349317831e 61 double
      r7  -9.2559631349317831e 61 double
      r8  -9.2559631349317831e 61 double
    

Expected result is:

  •   ms  MyStruct
      r1  1.0000000000000000  double
      r2  2.0000000000000000  double
      r3  3.0000000000000000  double
      r4  4.0000000000000000  double
      r5  5.0000000000000000  double
      r6  6.0000000000000000  double
      r7  7.0000000000000000  double
      r8  8.0000000000000000  double
    

Thank you

CodePudding user response:

You can debug rules. So, simplifying the input to "A 1 2 3 4\r\nB 5 6 7 8\r\n" and wrapping the real parser into a rule, this is the debug output:

Live On Coliru

<gr>
  <try>A 1 2 3 4\r\nB 5 6 7 8</try>
  <d19_12>
    <try> 1 2 3 4\r\nB 5 6 7 8\r</try>
    <success> 2 3 4\r\nB 5 6 7 8\r\n</success>
    <attributes>[1]</attributes>
  </d19_12>
  <d19_12>
    <try> 2 3 4\r\nB 5 6 7 8\r\n</try>
    <success> 3 4\r\nB 5 6 7 8\r\n</success>
    <attributes>[2]</attributes>
  </d19_12>
  <d19_12>
    <try> 3 4\r\nB 5 6 7 8\r\n</try>
    <success> 4\r\nB 5 6 7 8\r\n</success>
    <attributes>[3]</attributes>
  </d19_12>
  <d19_12>
    <try> 4\r\nB 5 6 7 8\r\n</try>
    <success>\r\nB 5 6 7 8\r\n</success>
    <attributes>[4]</attributes>
  </d19_12>
  <CRLF>
    <try>\r\nB 5 6 7 8\r\n</try>
    <success>B 5 6 7 8\r\n</success>
    <attributes>[]</attributes>
  </CRLF>
  <d19_12>
    <try> 5 6 7 8\r\n</try>
    <success> 6 7 8\r\n</success>
    <attributes>[5]</attributes>
  </d19_12>
  <d19_12>
    <try> 6 7 8\r\n</try>
    <success> 7 8\r\n</success>
    <attributes>[6]</attributes>
  </d19_12>
  <d19_12>
    <try> 7 8\r\n</try>
    <success> 8\r\n</success>
    <attributes>[7]</attributes>
  </d19_12>
  <d19_12>
    <try> 8\r\n</try>
    <success>\r\n</success>
    <attributes>[8]</attributes>
  </d19_12>
  <CRLF>
    <try>\r\n</try>
    <success></success>
    <attributes>[]</attributes>
  </CRLF>
  <success></success>
  <attributes>[[1, 2, 3, 4, 5, 4.27256e 180, 0, 0]]</attributes>
</gr>
Parsed: (1 2 3 4 5 4.27256e 180 0 0)

Indeed it confirms that all numbers were parsed. Why is attribute propagation not doing what you expect?

My guess is that it's attribute propagation trying to be accepting a little more than you expect. The problem is that your AST doesn't directly match the rule: the rule synthesizes

tup4 := tuple<double, double, double, double>
attribute := tuple<tup4, variant<tup4, tup4> >

In the Qi version this does get simplified to tuple<tup4, tup4> but your AST is actually like a tup8, which isn't the same. So when propagating, the rule just does what it thinks is the best option, which is assigning the first tup4. And then :shrug:

Fixes

The simplest fix would be to make your AST match the rules. That might actually make most sense because more likely than not, the "A", "B", "C" have semantic meaning.

namespace Ast {
    struct A {
        double r1, r2, r3, r4;
    };
    struct BC {
        double r5, r6, r7, r8;
    };
    struct MyStruct {
        A  a;
        BC bc;
    };

    using boost::fusion::operator<<;
} // namespace Ast

Adapting them:

BOOST_FUSION_ADAPT_STRUCT(Ast::A, r1, r2, r3, r4)
BOOST_FUSION_ADAPT_STRUCT(Ast::BC, r5, r6, r7, r8)
BOOST_FUSION_ADAPT_STRUCT(Ast::MyStruct, a, bc)

Note that, without further changes, this just confirms that automatic attribute propagation is a heuristics--based: Coliru: Parsed: ((1 0 0 0) (2 0 0 0)) (oops)

Making the rules match that structure:

qi::rule<It>         CRLF   = "\r\n";
qi::rule<It, double> d19_12 = qi::double_;

qi::rule<It, Ast::A()>  A  = "A" >> d19_12 >> d19_12 >> d19_12 >> d19_12; //
qi::rule<It, Ast::BC()> BC =                                              //
    'B' >> d19_12 >> d19_12 >> d19_12 >> d19_12 |                         //
    'C' >> d19_12 >> d19_12 >> d19_12 >>  qi::lit('_') >> qi::attr(0.0);

qi::rule<It, Ast::MyStruct()> gr = A >> CRLF >> BC >> CRLF;

Now it all works: Coliru

Prints

Parsed: ((1 2 3 4) (5 6 7 8))

Outside The Box

A lot of this seems XY problem to me. A struct with 8 non-descript numbers that can have varying meanings seems... not what you actually need.

Also, that B/C distinction seems to suggest you really want an "optional number" rule:

rule<It>         CRLF   = "\r\n";
rule<It, double()> d19_12 = raw[ //
    double_[_val = _1] |         //
    omit[ char_("_")]            //
][_pass = px::size(_1) == 19];

rule<It, Ast::Tup4()> Tup4 =
    omit[char_("ABC")] >> d19_12 >> d19_12 >> d19_12 >> d19_12;

Note how omit[char_("ABC")] directly reflects my intuition that you're throwing away semantic information in your model.

Now the grammar becomes

rule<It, Ast::MyStruct()> gr = Tup4 >> CRLF >> Tup4 >> CRLF;

And indeed, it parses the full input: Coliru

Parsed: ((1.0001 2.0002 3.0003 4.0004) (5.0005 6.0006 7.0007 8.0008))

Simplify! Containers

In fact, I suspect that you might even be better served with something like:

namespace Ast {
    using Reals = boost::container::static_vector<double, 8>;
} // namespace Ast

The fun fact is that containers do enjoy more flexible attribute propagation (with a new caveat). You can have something straight-forward as:

qi::rule<It, Ast::Reals(char const*)> Line =
    qi::omit[qi::char_(_r1)] >> d19_12 >> d19_12 >> d19_12 >> d19_12;

qi::rule<It, Ast::Reals()> gr = //
    Line( "A") >> CRLF >> Line( "BC") >> CRLF;

Let me conclude with a live example of such: Live On Compiler Explorer¹

//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/container/static_vector.hpp>
#include <fmt/ranges.h>
#include <iomanip>
#include <iostream>
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;

namespace Ast {
    using Reals = boost::container::static_vector<double, 8>;
} // namespace Ast

int main()
{
    using It = std::string::const_iterator;
    using namespace qi::labels;

    qi::rule<It>         CRLF   = "\r\n";
    qi::rule<It, double()> d19_12 = qi::raw[ //
        qi::double_[_val = _1] |             //
        qi::omit[ qi::char_("_")]            //
    ][_pass = px::size(_1) == 19];

    qi::rule<It, Ast::Reals(char const*)> Line =
        qi::omit[qi::char_(_r1)] >> d19_12 >> d19_12 >> d19_12 >> d19_12;

    qi::rule<It, Ast::Reals()> gr = //
        Line( "A") >> CRLF >> Line( "BC") >> CRLF;

    BOOST_SPIRIT_DEBUG_NODES((gr)(Line)(d19_12)(CRLF))

    for (std::string const test : {
             "A 1.000100000000e 00 2.000200000000e 00 3.000300000000e 00 4.000400000000e 00\r\n"
             "B 5.000500000000e 00 6.000600000000e 00 7.000700000000e 00 8.000800000000e 00\r\n",
             "A 1.000100000000e 00 2.000200000000e 00 3.000300000000e 00 4.000400000000e 00\r\n"
             "C 5.000500000000e 00 6.000600000000e 00 7.000700000000e 00___________________\r\n",
         }) {
        It f = test.cbegin(), e = test.cend();

        Ast::Reals data;
        if (parse(f, e, gr, data)) {
            fmt::print("Parsed: {}\n", data);
        } else {
            fmt::print("Failed\n");
        }

        if (f != e) {
            std::cout << "Remaining: " << std::quoted(std::string(f, e))
                      << "\n";
        }
    }
}

Prints

Parsed: {1.0001, 2.0002, 3.0003, 4.0004, 5.0005, 6.0006, 7.0007, 8.0008}
Parsed: {1.0001, 2.0002, 3.0003, 4.0004, 5.0005, 6.0006, 7.0007, 0}

¹ I lazed out on the output formatting, using libfmt instead of writing my vector printing cruft again; Coliru doesn't have libfmt (or c 23) yet

  • Related