Home > Net >  The correct regular expression to search for multiple occurrences
The correct regular expression to search for multiple occurrences

Time:09-20

I have this source text:

{ff0000}Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.{8b00ff}Ut enim {FFFFFF}ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.{0000ff}Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

The task is to divide this text into 4 occurrences of a regular expression

The results should look like this:

  1. {ff0000}Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
  2. {8b00ff}Ut enim
  3. {FFFFFF}ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
  4. {0000ff}Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

I have compiled a regular expression that looks for HEX colors, but I can't limit the search to the next occurrence

Regular expression: \{[a-zA-Z0-9]{6}\}

I tried: \{[a-zA-Z0-9]{6}\}.* | (\{[a-zA-Z0-9]{6}\}{1}).* | ((\{[a-zA-Z0-9]{6}\}{1}).*){1}

In all three cases after the addition .* all other HEX characters are no longer perceived by the regular expression and are perceived as .* The question is to somehow limit each following regular expression to the occurrence of the next HEX color

CodePudding user response:

For matching characters that are not { use a negated character class.

\{[A-Za-z\d]{6}\}[^{]*

See this demo at regex101

CodePudding user response:

Just use the positions of the matches to determine the relevant substrings. Every substring starts with the occurance of one of the patterns and ends either at the end of the input string or the next match of the pattern, whatever comes first.

std::vector<std::string> FindMatches(std::string_view const input)
{
    std::regex reg("\\{[a-zA-Z0-9]{6}\\}");
    constexpr size_t MatchLength = 8;


    std::vector<std::string> result;

    std::match_results<std::string_view::const_iterator> match;
    if (std::regex_search(input.begin(), input.end(), match, reg))
    {
        auto partStart = input.begin()   match.position();
        while (std::regex_search(partStart   MatchLength, input.end(), match, reg))
        {
            auto partEnd = partStart   (MatchLength   match.position());
            result.emplace_back(partStart, partEnd);
            partStart = partEnd;
        }
        result.emplace_back(partStart, input.end());
    }

    return result;
}

int main()
{
    using namespace std::literals::string_view_literals;

    auto const haysack = "{ff0000}Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.{8b00ff}Ut enim {FFFFFF}ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.{0000ff}Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."sv;

    size_t index = 1;

    for (auto& part : FindMatches(haysack))
    {
        std::cout << index << ". " << part << '\n';
          index;
    }
}
  • Related