Home > Software design >  regex with string_view returns garbage
regex with string_view returns garbage

Time:06-02

Matching a regex on a std::string_view works fine. But when I return matched substrings, they die for some reason. std::string_view argument is being destroyed upon the end of the function's scope, but the memory it points to is valid.
I expected std::match_results to point to the initial array and not to make any copies, but the behavior I observe shows that I am wrong. Is it possible to make this function work without additional allocations for substrings?

#include <tuple>
#include <regex>
#include <string_view>

#include <iostream>

using configuration_str = std::string_view;
using platform_str = std::string_view;

std::tuple<configuration_str, platform_str> parse_condition_str(std::string_view conditionValue)
{
    // TODO: fix regex
    constexpr const auto &regexStr =
        R"((?:\'\$\(Configuration\)\s*\|\s*\$\(Platform\)\s*\'==\'\s*)(. )\|(. )')";
    static std::regex regex{ regexStr };

    std::match_results<typename decltype(conditionValue)::const_iterator> matchResults{};
    bool matched =
        std::regex_match(conditionValue.cbegin(), conditionValue.cend(), matchResults, regex);

    (void)matched;

    std::string_view config = matchResults[1].str();
    std::string_view platform = matchResults[2].str();

    return { config, platform };
}

int main()
{
    const auto &stringLiteralThatIsALIVE = "'$(Configuration)|$(Platform)'=='Release|x64'";
    const auto&[config, platform] = parse_condition_str(stringLiteralThatIsALIVE);
    std::cout << "config: " << config << "\nplatform: " << platform << std::endl;

    return 0;
}

https://godbolt.org/z/TeYMnn56z


CLang-tydy shows a warning: Object backing the pointer will be destroyed at the end of the full expression std::string_view platform = matchResults[2].str();

CodePudding user response:

For example, let's look at the following line:

std::string_view config = matchResults[1].str();

Here, matchResults is of type std::match_results, and [1] is its std::match_results::operator[], which returns an std::sub_match.

But then, .str() is its std::sub_match::str(), which returns an std::basic_string.

This returned temporary sting object will be destroyed at the end of the full-expression (thanks, @BenVoigt, for the correction), i.e., in this case, immediately after the config gets initialized and the line in question finishes executing. So, the Clang's warning you quote is correct.

By the time when the parse_condition_str() function returns, both the config and platform string-views will thus be pointing into already destroyed strings.

CodePudding user response:

Manually specifying pointer with offset and length yields the desirable results:

std::string_view config{conditionValue.data()   matchResults.position(1), matchResults.length(1)};
std::string_view platform{conditionValue.data()   matchResults.position(2), matchResults.length(2)};

https://godbolt.org/z/cGjs39Ehq

However the question still stands in regards to why .str() method on a submatch returns a temporary and results in garbage.

  • Related