Discovering consecutive repeating patterns, constants, or unique elements in a string-CodePudding

This is my first question I hope to be concise.

I am looking for a way to find each consecutive sub-pattern, constant or single character in a string. I came across this problem trying to "simplify" a challenge, incredibly trying to simplify was more complex than solving the original exercise.

Since I'm stuck and I'm really curious to see a resolution to my problem, I attach examples and the expected results for each one.

The output must be in the same order as the supplied string.

example_input = 'XXXXXXXXXXZZZZZZZZZZZZZZZZZZZZ'
expected_result = [['X',10],['Z',20]]

example_input = 'ABCABCDDDE'
expected_result = [['ABC',2],['D',3],['E',1]]

example_input = 'ABCZZZZ XYXY'
expected_result = [['A',1],['B',1],['C',1],['Z',4],[' ',1],['XY',2]]

example_input = 'ABCDABCDDCBADCBAABCDABCD'
expected_result = [['ABCD',2],['DCBA',2],['ABCD',2]]
    
example_input = 'ZZZZZZZZZZABCDABCDABCD  DCBADCBADCBADCBADCBAXYZXYZ'
expected_result = [['Z',10],['ABCD',3],[' ',2],['DCBA',5],['XYZ',2]]

RE EDITED

I found a valid solution for all the example cases, however I'm still using regex, I'd really appreciate it if someone manages to fix it without using regex.

(Any language is valid, not just Python).

import re

def get_patterns(str) -> list:
  regex_find = re.findall(r'(. ?)\1{1,}|([A-Z ]{1})', str)
  matches = [i[0] if i[0] else i[1] for i in regex_find]
  expected_result = []
  while len(str) > 0 and matches:
    current = matches[0]
    count = 0
    
    while current in str[:len(current)]:
      count  = 1
      str = str[len(current):]
  
    expected_result  = [current, count],
    matches.pop(0)
  return expected_result

print(get_patterns('XXXXXXXXXXZZZZZZZZZZZZZZZZZZZZ') == [['X',10],['Z',20]])
print(get_patterns('ABCABCDDDE') == [['ABC',2],['D',3],['E',1]])
print(get_patterns('ABCZZZZ XYXY') == [['A',1],['B',1],['C',1],['Z',4],[' ',1],['XY',2]])
print(get_patterns('ABCDABCDDCBADCBAABCDABCD') == [['ABCD',2],['DCBA',2],['ABCD',2]])
print(get_patterns('ZZZZZZZZZZABCDABCDABCD  DCBADCBADCBADCBADCBAXYZXYZ') == [['Z',10],['ABCD',3],[' ',2],['DCBA',5],['XYZ',2]])

CodePudding user response：

The following is a C solution that passes all test cases. I first identify all true repeats, leaving single chars be. Then I lay out the found repeats and fill in single chars from the input sequence. It looks more complex in C than in Python, but at least it does not use regex.

#include <string>
#include <iostream>
#include <utility>

void repeats() {
    std::string example_input1 = "XXXXXXXXXXZZZZZZZZZZZZZZZZZZZZ";
    std::string example_input2 = "ABCABCDDDE";
    std::string example_input3 = "ABCZZZZ XYXY";
    std::string example_input4 = "ABCDABCDDCBADCBAABCDABCD";
    std::string example_input5 = "ZZZZZZZZZZABCDABCDABCD  DCBADCBADCBADCBADCBAXYZXYZ";

    std::string input = example_input5; // select input string

        // find repeats

    const int L = int(input.length());
    std::vector<std::string> reps(L,"");
    for (int i=0; i<L;   i) {
        for (int j=i 1; j<L;   j) {
            const int l = j-i;
            if (input.compare(i,l,input,j,l) == 0) {
                reps[i] = input.substr(i,l);
                break;  // repeat found
            }
        }
    }

        // print repeats

/*
    for (int i=0; i<L;   i) {
        std::cout << i << ":" << reps[i] << ";" << std::endl;
    }
*/
        // layout of repeats and single chars

    using Pair = std::pair<std::string, int>;
    std::vector<Pair> res;
    for (int i=0; i<L;) {
        const std::string s = reps[i];
        const int l = int(s.length());
        if (l > 0) { // a repeat
            int r = 1;
            for (int j=i; j<L && s==reps[j]; j =l)   r;
            res.push_back(Pair(s,r));
            i  = r*l; // skip whole repeated section
        } else { // a single char
            std::string t(1, input[i]);
            res.push_back(Pair(t,1));
            i  = 1; // skip one char
        }
    }

        // print result

    std::string delim = "";
    std::cout << "[";
    for (Pair p : res) {    
        std::cout << delim << "['" << p.first << "'," << p.second << "]";
        delim = ",";
    }
    std::cout << "]" << std::endl;
}