Home > database >  Effeciently remove single letter substrings from a string
Effeciently remove single letter substrings from a string

Time:12-19

So I've been trying to attack this problem for a while but have no idea how to do it efficiently.

I'm given a substring of N (N >= 3) characters, and the substring contains solely of the characters 'A' and 'B'. I have to efficiently find a way to count all the substrings possible, which have only one A or only one B, with the same order given.

For example ABABA: For three letters, the substrings would be: ABA, BAB, ABA. For this all three count because all three of them contain only one B or only one A. For four letters, the substrings would be: ABAB, BABA. None of these count because they both don't have only one A or B. For five letters: ABABA. This doesn't count because it doesn't have only one A or B. If the string was bigger, then all substring combinations would be checked.

I need to implement this is O(n^2) or even O(nlogn) time, but the best I've been able to do was O(n^3) time, where I loop from 3 to the string's length for the length of the substrings, use a nested for loop to check each substring, then use indexOf and lastIndexOf and seeing for each substring if they match and don't equal -1 (meaning that there is only 1 of the character), for both A and B.

Any ideas how to implement O(n^2) or O(nlogn) time? Thanks!

CodePudding user response:

Effeciently remove single letter substrings from a string

This is completely impossible. Removing a letter is O(n) time already. The right answer is to not remove anything anywhere. You don't need to.

The actual answer is to stop removing letters and making substrings. If you call substring you messed up.

Any ideas how to implement O(n^2) or O(nlogn) time? Thanks!

I have no clue. Also seems kinda silly. But, there's some good news: There's an O(n) algorithm available, why mess about with pointlessly inefficient algorithms?

charAt(i) is efficient. We can use that.

Here's your algorithm, in pseudocode because if I just write it for you, you wouldn't learn much:

First do the setup. It's a little bit complicated:

  • Maintain counters for # of times A and B occurs.
  • Maintain the position of the start of the current substring you're on. This starts at 0, obviously.
  • Start off the proceedings by looping from 0 to x (x = substring length), and update your A/B counters. So, if x is 3, and input is ABABA, you want to end with aCount = 2 and bCount = 1.

With that prepwork completed, let's run the algorithm:

  • Check for your current substring (that's the substring that starts at 0) if it 'works'. You do not need to run substring or do any string manipulation at all to know this. Just check your aCount and bCount variables. Is one of them precisely 1? Then this substring works. If not, it doesn't. Increment your answer counter by 1 if it works, don't do that if it doesn't.
  • Next, move to the next substring. To calculate this, first get the character at your current position (0). Then substract 1 from aCount or bCount depending on what's there. Then, fetch the char at 'the end' (.charAt(pos x)) and add 1 to aCount or bCount depending on what's there. Your aCount and bCount vars now represent how many As respectively Bs are in the substring that starts at pos 1. And it only took 2 constant steps to update these vars.
  • ... and loop. Keep looping until the end (pos x) is at the end of the string.

This is O(n): Given, say, an input string of 1000 chars, and a substring check of 10, then the setup costs 10, and the central loop costs 990 loops. O(n) to the dot. .charAt is O(1), and you need two of them on every loop. Constant factors don't change big-O number.

  • Related