Home > Blockchain >  Algorithm intuition: transform but with possibly more output items than input items?
Algorithm intuition: transform but with possibly more output items than input items?

Time:09-27

I want to escape a string. That is, I want to copy characters, but where the input has " or \, I want to prepend \ on the output. I can write that easily enough, even very generically:

//! Given a range of code units (e.g., bytes in utf-8), transform them to an output range.
//! If a code unit needs escaping (per the given predicate), insert esc before that code unit in the output.
template <typename InRange, typename OutIter, typename NeedsEscapingPred, typename EscCodeUnit>
OutIter transformEscapeCodeUnits(const InRange& in, OutIter out,
     NeedsEscapingPred needsEscaping,
     EscChar esc) {
     for (const auto& codeUnit : in) {
         if (needsEscaping(codeUnit)) {
             *out   = esc;
         }
         *out   = codeUnit;
     }
     return out;
}

//! Convenience overload for common case:
template <typename InRange, typename OutIter>
OutIter transformEscapeCodeUnits(const InRange& in, OutIter out, char esc = '\') {
    return transformEscapeCodeUnits(in, out, [](auto c) { return c == '\' || c == '"'; }, esc);
}

However, in the spirit of "no raw loops", I looked at the algorithm and numeric header in search of a generic algorithm to do this. There's replace_if and replace_copy_if and remove_if, but I'm not seeing any std algorithms that take a sequence and output a potentially-longer sequence. This would be basically insert_copy_if, or even more generically, something like transform_items:

//! Like transform, but TransformItem takes an element and an iterator and writes zero or more output elements:
template <typename InRange, typename OutIter, typename TransformItem>
OutIter transform_items(InRange&& inRange, OutIter out, TxFn transformItem) {
    for (auto&& x : std::forward<InRange>(inRange)) {
        out = transformItem(std::forwrad<decltype(x)>(x), out);
    }
    return out;
}

Then the escaping case would call transform_items(in, out, [shouldEsc, esc](auto c, auto out) { if (shouldEsc(c)) { *out = esc; } *out = c; }).

Am I missing something, or is there nothing quite like that in the standard library?

CodePudding user response:

After asking on Twitter, I got some very satisfying answers, although it involves the M-word (monad).

First, @atorstling suggests that this is flatMap in JavaScript: https://twitter.com/atorstling/status/1574097704098988033?s=20&t=jzS503R6fMOqCajReygzJg https://dmitripavlutin.com/javascript-array-flatmap/

So translating to C , I think the answer to my question is that this should be called flat_transform. With pedantic use of forwarding, that's:

//! Like transform, but TransformItem takes an element and an iterator and writes zero or more output elements:
template <typename InRange, typename OutIter, typename TransformOne>
OutIter flat_transform(InRange&& inRange, 
                       OutIter out,
                       TransformOne transform_one) {
    for (auto&& x : std::forward<InRange>(inRange)) {
        out = transform_one(std::forwrad<decltype(x)>(x), std::move(out));
    }
    return out;
}

and yes, this appears to be a "missing algorithm" from the STL.

But additionally, @salociN001 pointed out that "It's basically the monadic bind operation: Transforming a Container of A with a function taking A and returning a container of A into a flat container of A." https://twitter.com/salociN001/status/1573918982557548545?s=20&t=jzS503R6fMOqCajReygzJg

So there we have it:

  • It's monadic bind.
  • A reasonable C name is flat_transform.
  • It is, in fact, a missing algorithm.
  • Related