This is my first question I hope to be concise.
I am looking for a way to find each consecutive sub-pattern, constant or single character in a string. I came across this problem trying to "simplify" a challenge, incredibly trying to simplify was more complex than solving the original exercise.
Since I'm stuck and I'm really curious to see a resolution to my problem, I attach examples and the expected results for each one.
The output must be in the same order as the supplied string.
example_input = 'XXXXXXXXXXZZZZZZZZZZZZZZZZZZZZ'
expected_result = [['X',10],['Z',20]]
example_input = 'ABCABCDDDE'
expected_result = [['ABC',2],['D',3],['E',1]]
example_input = 'ABCZZZZ XYXY'
expected_result = [['A',1],['B',1],['C',1],['Z',4],[' ',1],['XY',2]]
example_input = 'ABCDABCDDCBADCBAABCDABCD'
expected_result = [['ABCD',2],['DCBA',2],['ABCD',2]]
example_input = 'ZZZZZZZZZZABCDABCDABCD DCBADCBADCBADCBADCBAXYZXYZ'
expected_result = [['Z',10],['ABCD',3],[' ',2],['DCBA',5],['XYZ',2]]
RE EDITED
I found a valid solution for all the example cases, however I'm still using regex, I'd really appreciate it if someone manages to fix it without using regex.
(Any language is valid, not just Python).
import re
def get_patterns(str) -> list:
regex_find = re.findall(r'(. ?)\1{1,}|([A-Z ]{1})', str)
matches = [i[0] if i[0] else i[1] for i in regex_find]
expected_result = []
while len(str) > 0 and matches:
current = matches[0]
count = 0
while current in str[:len(current)]:
count = 1
str = str[len(current):]
expected_result = [current, count],
matches.pop(0)
return expected_result
print(get_patterns('XXXXXXXXXXZZZZZZZZZZZZZZZZZZZZ') == [['X',10],['Z',20]])
print(get_patterns('ABCABCDDDE') == [['ABC',2],['D',3],['E',1]])
print(get_patterns('ABCZZZZ XYXY') == [['A',1],['B',1],['C',1],['Z',4],[' ',1],['XY',2]])
print(get_patterns('ABCDABCDDCBADCBAABCDABCD') == [['ABCD',2],['DCBA',2],['ABCD',2]])
print(get_patterns('ZZZZZZZZZZABCDABCDABCD DCBADCBADCBADCBADCBAXYZXYZ') == [['Z',10],['ABCD',3],[' ',2],['DCBA',5],['XYZ',2]])
CodePudding user response:
The following is a C solution that passes all test cases. I first identify all true repeats, leaving single chars be. Then I lay out the found repeats and fill in single chars from the input sequence. It looks more complex in C than in Python, but at least it does not use regex.
#include <string>
#include <iostream>
#include <utility>
void repeats() {
std::string example_input1 = "XXXXXXXXXXZZZZZZZZZZZZZZZZZZZZ";
std::string example_input2 = "ABCABCDDDE";
std::string example_input3 = "ABCZZZZ XYXY";
std::string example_input4 = "ABCDABCDDCBADCBAABCDABCD";
std::string example_input5 = "ZZZZZZZZZZABCDABCDABCD DCBADCBADCBADCBADCBAXYZXYZ";
std::string input = example_input5; // select input string
// find repeats
const int L = int(input.length());
std::vector<std::string> reps(L,"");
for (int i=0; i<L; i) {
for (int j=i 1; j<L; j) {
const int l = j-i;
if (input.compare(i,l,input,j,l) == 0) {
reps[i] = input.substr(i,l);
break; // repeat found
}
}
}
// print repeats
/*
for (int i=0; i<L; i) {
std::cout << i << ":" << reps[i] << ";" << std::endl;
}
*/
// layout of repeats and single chars
using Pair = std::pair<std::string, int>;
std::vector<Pair> res;
for (int i=0; i<L;) {
const std::string s = reps[i];
const int l = int(s.length());
if (l > 0) { // a repeat
int r = 1;
for (int j=i; j<L && s==reps[j]; j =l) r;
res.push_back(Pair(s,r));
i = r*l; // skip whole repeated section
} else { // a single char
std::string t(1, input[i]);
res.push_back(Pair(t,1));
i = 1; // skip one char
}
}
// print result
std::string delim = "";
std::cout << "[";
for (Pair p : res) {
std::cout << delim << "['" << p.first << "'," << p.second << "]";
delim = ",";
}
std::cout << "]" << std::endl;
}