Home > front end >  Trying to reduce Big O complexity of compile-time format string parsing
Trying to reduce Big O complexity of compile-time format string parsing

Time:09-10

We have a formatting library similar to fmtlib which allows us to format strings like this:

int foo = 1;
double bar = 2.3;
FMT("foo={} bar={}", foo, bar);

We recently added parsing of the format string at compile time, allowing us to specify, eg width, precision etc directly in the format string

// print foo with width=5, pad with 0; print bar with width 6, precision 2
// as decimal floating point
FMT("foo={:05} bar={:6.2f}", foo, bar); 

The way we do this is as follows:

We have a struct which I can use to pass the format string as a template parameter

struct FmtString
{
    template<std::size_t N>
    constexpr FmtString(const char (&s)[N])
        : str(s)
    {}
    const char* const str;
};

Together with this FmtString and the arguments to be formatted, we use the following somewhat convoluted dance to generate a std::tuple of FmtSpecT format specifications.

We use the variadic template parameter pack to generate a std::index_sequence:

// use Ts... to create a tuple of the argument types and an std::index_sequence
template<FmtString S, typename... Ts>
struct CompiledFmtSpec
{
    using type = decltype(compileFmtSpec<S, std::tuple<Ts...>>(std::index_sequence_for<Ts...> {}));
};

// pull the indices out of the std::index_sequence
template<FmtString S, typename Tuple, std::size_t... Is>
consteval auto compileFmtSpec(std::index_sequence<Is...>)
{
    return typename FmtSpecs<S, Tuple, Is...>::type {};
}

We use the std::index_sequence to generate a variadic std::size_t sequence which we can use to extract each argument from the tuple of arguments, and its respective sequence value

template<FmtString S, typename Tuple, std::size_t... Is>
struct FmtSpecs
{
    using type = std::tuple<FmtSpecWrapper<S, std::tuple_element_t<Is, Tuple>, Is>...>;
};

This results in the following tuple:

std::tuple<
    FmtSpecWrapper<S, T0, 0>,
    FmtSpecWrapper<S, T1, 1>,
    FmtSpecWrapper<S, T2, 2>,
    ...
    FmtSpecWrapper<S, Tn, n>>

We can then specialise FmtSpecWrapper for the particular type and parse its supported format spec

Here is the default implementation

template<FmtString S, typename T, std::size_t I>
struct FmtSpecWrapper
{
    using type = decltype(parseFmtString<S, I>());
};

template<FmtString S, std::size_t I>
consteval auto parseFmtString()
{
    constexpr const char* pos = findFmtSpec(S, I);
    constexpr FmtSpec fmt = parseFmtSpec(pos);

    return FmtSpecT<fmt.pad, fmt.align_right, fmt.sign, fmt.width, fmt.precision, fmt.type, fmt.spec_len> {};
}

This results in a FmtSpecT specialisation, which has the following NTTPs:

template<char Pad, bool AlignRight, bool Sign, int Width, int Precision, char Type, int SpecLen>
struct FmtSpecT
{
    static constexpr char pad = Pad;
    static constexpr bool align_right = AlignRight;
    static constexpr bool sign = Sign;
    static constexpr int width = Width;
    static constexpr int precision = Precision;
    static constexpr char type = Type;
    static constexpr int spec_len = SpecLen;
};

During formatting we can then pull out each FmtSpecT and use the members to format the argument.

The issue I'd like to try and solve is findFmtSpec(S, I).

Its job is to find the I'th format spec from the format string S.

constexpr const char* findFmtSpec(const FmtString& S, std::size_t i)
{
    std::size_t curr = 0;
    const char* next = S.str;
    while (*next)
    {
        const char c = *next  ;
        if (c == '{')
        {
            if (c == *next) // found an escaped brace; {{ or }}
            {
                  next;
            }
            else // found the start of the fmt spec
            {
                if (curr == i)
                    return next;
                  curr;
                  next;
            }
        }
    }
    throw std::domain_error("too many arguments for format string");
}

My issue is that it is Big O N^2, as it starts from 0 for every argument.

Ideally I would be able to start from the end of the previous format spec and seek forwards to find the next, turning the complexity into O(N).

Any ideas on how I can modify findFmtSpec so that for each i it starts from the return value of the previouc call?

CodePudding user response:

FmtString S is a template parameter, so it is usable in constant expressions.

Therefore all you need to do is write a constexpr function that extracts the specifiers up-front into a suitable container type, for example std::array<std::string_view, N>. You can either enforce directly in the function that the number of specifiers matches sizeof...(Ts), so that N can just be that, or you can write another constexpr function which first counts the number of arguments:

struct FmtString
{
    template<std::size_t N>
    constexpr FmtString(const char (&s)[N])
        : str(s)
    {}
    const char* const str;

    template<FmtString S>
    static constexpr auto findFmtSpec = []{
        constexpr auto N = []{
            // iterate `S.str` here and return number of specifiers
        }();
        std::array<const char*, N> result;
        // iterate `S.str` again and store beginning of format specifiers consecutively in `result`.
        return result;
    }();
};

Then parseFmtString can just use FmtString::findFmtSpec<S>[I]. I am using a static member variable template instead of a static function template to make sure that the compiler doesn't keep reevaluating an equivalent function call, although I think the compiler should memoize calls with the same arguments. I used nested lambdas for brevity, but you might want to put these into separate functions. There is also no particular reason to have the variable template be a member. It works outside the class as well.


You can also just generate the whole tuple this way:

template<FmtString S, typename... Ts>
static constexpr auto specs = []{
    // number of specifiers, can be `sizeof...(Ts)`
    // if no specific error handling is required
    constexpr auto N = []{
        // iterate `S.str` here and return number of specifiers
    }();

    if constexpr(sizeof...(Ts) != N) {
        // handle argument number mismatch
    }

    // specs as values
    constexpr auto specs = []{
        std::array<FmtSpec, N> specs;
        // iterate `S.str` again and store `FmtSpec` for each specifier consecutively in `specs`
        return specs;
    }();
    
    // lift specs into type (template arguments)
    return []<std::size_t... Is>(std::index_sequence<Is...>){
        return std::tuple<FmtSpecWrapper<S, Ts, FmtSpecT<specs[Is].pad, /*...*/>>...>{};
    }(std::make_index_sequence<N>{});
}();
  •  Tags:  
  • c
  • Related