Home > Software engineering >  C find non-overlapping substring in string with position specifics
C find non-overlapping substring in string with position specifics

Time:11-04

I am trying to figure out how to find count of non-overlapping substrings in string, but only those at the same distances. I managed to get number of non-overlapping substrings, but this is just subset of the requirement.

Example: I have a string "aaaaaaaaaabaaaaaaba" and I want to count number of non-overlapping substrings "aa" but only those which are in distances multiplied by 2.

Standard non-overlapping code returns 8, which is correct (aa aa aa aa aa b aa aa aa ba - distinguishing the findings by spaces). It finds the substrings at positions 0,2,4,6,8,11,13,15.

While I try to implement the distance requirement, I have a problem. The desired count should be 7, omitting the 'a' at position 11 and continue at position 12 (aa aa aa aa aa ba aa aa ab a) ... again distinguishing the findings by spaces, the 'a' at the position 11 should form the substring 'ba' with 'b' at position 10

I tried to use modulo to identify the correct position and count only substrings which starts on position with modulo result 0, but by executing my code, processing just stops at the 5th occurrence, before 'b' in the middle, and ignore next ones.

wrong: aa aa aa aa aa b aa aa aa b a

good:  aa aa aa aa aa ba aa aa ab a

Similarly, it should work with other sizes e.g., finding "aaa", so the result would be 4 "aaa aaa aaa aba aaa aab a" instead of 5 "aaa aaa aaa ab aaa aaa ba"

string str = "aaaaaaaaaabaaaaaaba";
    string sub = "aa";
    int count = 0;

    for (size_t offset = str.find(sub); offset != std::string::npos;
            offset = str.find(sub, offset   sub.length()))
    {
        int pos = offset % sub.length();
        if (pos == 0)
        {
              count;
            cout << "offset: " << offset << " count: " << count << endl;
        }
    }
    cout << "count: " << count << endl;
    return 0;

CodePudding user response:

Your loop is doing a wrong thing: it tries to find substring starting from the last found offset 2 (the length of aa).

After the first aa is found at the odd position, all subsequent attempts will also find odd positions.

This works:

for (size_t offset = str.find(sub); offset != std::string::npos;
    offset = str.find(sub, offset   1))

CodePudding user response:

I suggest only using std::string::find to find the first substring and then just step sub.size() every iteration.

Here's an example using a std::string_view over the current substring in str:

#include <string>
#include <string_view>
#include <iostream>

int main() {
    std::string str = "aaaaaaaaaabaaaaaaba";
    //                 ! ! ! ! ! X ! ! X X
    //                 0 2 4 6 81012141618
    std::string sub = "aa";
    int count = 0;

    for(size_t offset = str.find(sub);
        offset < str.size() - sub.size();
        offset  = sub.size())             // just step sub.size() here
    {
        if(sub == std::string_view(str.c_str()   offset, sub.size()))
            std::cout << "offset: " << offset << " count: " <<   count << '\n';
    }

    std::cout << "count: " << count << '\n';
}

Output:

offset: 0 count: 1
offset: 2 count: 2
offset: 4 count: 3
offset: 6 count: 4
offset: 8 count: 5
offset: 12 count: 6
offset: 14 count: 7
count: 7

CodePudding user response:

I tried to find solution all day and, a few minutes after posting my question, I found the solution (Murphy's law).

The trick is to add an 'else' statement and shift the offset back by the modulo remainder:

if (pos == 0)
        {
              count;
            cout << "offset: " << offset << " count: " << count << endl;
        }
else
        {
            offset = offset - pos;
            cout << "pos: " << pos << endl;
        }
  • Related