Home > Back-end >  Count maximum consecutive occurrence of a substring in a string, using only basic string operations
Count maximum consecutive occurrence of a substring in a string, using only basic string operations

Time:06-29

int longest_str(std::string str, std::string dna) {
    int longest_count = 0;
    int current_count = 0;
    std::string temp;
    for (int i = 0; i <= (dna.size() - str.size());   i) {
      if (dna.at(i) == str.at(0)) {
          
        for (int j = i; j < i   str.size();   j) {
          temp.push_back(dna.at(j));
        }
        if (temp.compare(str) == 0) {
          current_count  ;
          temp.clear();
          i  = str.size();

      
        } else if (temp.compare(str) != 0) {
          if (current_count >= longest_count) {
            longest_count = current_count;
          }
          temp.clear();
          current_count = 0;
  
        } 
      } else if (dna.at(i) != str.at(0)) {
        if (current_count >= longest_count) {
          longest_count = current_count;
        }
        continue;
      }
        
  }
    
  return longest_count;
}

Basically what I'm trying to do here is to loop through the given DNA strand, and when I encounter the first letter of the STR, say at index i, I will append the next i str.size() letters from the DNA strand to a temporary variable. Then, I will compare that temporary variable with the STR, and if they equal, increase the current count by 1, clear the temporary string, increase the index accordingly, and repeat the process. If the temporary string does not equal the STR, I will clear the string, update longest count up to that point, and reset current count to 0. If I encounter a character that does not equal to the first character of the STR, I will similarly update the longest count, reset current count to 0, and repeat the process.

So for example, if the given DNA is "GTATTAATTAATTAATTAGTA", and the STR is "ATTA", this function should return 4.

The function has been giving me seemingly arbitrary answers when I tested it against different inputs, so I am not really sure what's going wrong here. My speculation is that there's something wrong with the way I update the longest_count variable. Any suggestions?

CodePudding user response:

The issue is that you do:

i  = str.size()

but you forget that your loop itself is also incrementing i, so you are actually skipping an index.

Second issue is that you might exit the loop before

      if (current_count >= longest_count) {
        longest_count = current_count;
      }

is executed, if your last match was less than the lenght of the string back (or if your last match ends exactly at the end of the string).

Here is a dirty fix of your code:

int longest_str(std::string str, std::string dna) {
    int longest_count = 0;
    int current_count = 0;
    std::string temp;
    for (int i = 0; i <= (dna.size() - str.size());   i) {

      if (dna.at(i) == str.at(0)) {
          
        for (int j = i; j < i   str.size();   j) {
          temp.push_back(dna.at(j));
        }
        
        if (temp.compare(str) == 0) {
          current_count  ;
          temp.clear();
          i  = (str.size()-1);
        } else if (temp.compare(str) != 0) {
          temp.clear();
          current_count = 0;
        }
        if (current_count >= longest_count) {
          longest_count = current_count;
        }
      } else if (dna.at(i) != str.at(0)) {
        if (current_count >= longest_count) {
          longest_count = current_count;
        }
        continue;
      }
        
  }
    
  return longest_count;
}

Other solutions might be cleaner, but hopefully you see the problems now.

Another issue that you might have missed is what happens when your string contains overlapping matches. for example:

"ATTATTAATTA"

With you code you will find the first "ATTA", but you won't find the overlapping "ATTAATTA", so you actually need to rethink your logic anyways

CodePudding user response:

I wrote a cleaner implementation for your code:

#include <iostream>

size_t longestStr(std::string dna, std::string str)
{
    size_t temp_ans = 0, final_ans;

    for (size_t i = 0; i < dna.size(); i  )
    {
        if (dna[i] == str[0])
        {
            final_ans = 0;
            while (dna.substr(i, str.size()) == str)
            {
                if (  final_ans > ans)
                {
                    ans  ;
                }
                i  = str.size();
            }
            i -= (str.size() * final_ans);
        }
    }

    return ans;
}

int main()
{
    std::cout << longestStr("GTATTAATTAATTAATTAGTAATTAATTAATTAATTAATTAATTAGTA", "ATTA");
}

Output:

6

No need of temp.
Also you should preferably use size_t instead of int because max size of a string > int.

  • Related