How to write a python regular expression to find any two letters that appears at least twice-CodePudding

How to write a python regular expression to find any two letters that appears at least twice in the string without overlapping, like xyxy (xy) or aabcdefgaa (aa) aaaa (aa) abcadb (ab), but not like aaa (aa, but it overlaps).

I already tried the following patterns none of them is capable of handle all the cases in the above statment.

r"(.)(?!\1)(.)\1" cannot pass "abdba" r"(.)(?<!\1)(.)\1" cannot pass "xyxy" r"(.)(?<!\1)(.)(?!\2)" cannot pass "xyxy"

I use python's re package to run the above patterns.

CodePudding user response：

We can use

x = input()
y = ""
s = 0
h = ""
for i in x:
    if i in y:
        s = s 1
    y = y   i
    if s == 1:
        h = h i
        s = 0
        y = y.replace(i, "")
print(h)

CodePudding user response：

What you're trying to do cannot be written as a true regular expression (by the formal definition of regular languages). Or at least it cannot without making the expression ridiculously long.

Python implementation of regex adds some additional features (that are technically not regular), like backreferences that allow awkwardly solving this particular task, although, it will still require some tinkering.

Basically, you have to write a regex for each possible permutation of two values: AABB, ABAB and ABBA; then glue them together using the operator |.

import re

regex = r'(.).*\1.*(.).*\2|(.).*(.).*\3.*\4|(.).*(.).*\4.*\3'

string = input('type your string:\n')
match = re.search(regex, string)
if match:
    result = [x for x in match.groups() if x is not None]
    print(result)
else:
    print(None)

However, a normal for loop on sorted letters from the string is IMO a much better solution.

string = input('type your string:\n')
previous_letter = None
result = []
for letter in sorted(string):
    if letter == previous_letter:
        result.append(letter)
        previous_letter = None
    else:
        previous_letter = letter
print(result)

It should work faster and it is not limited to 2 letters only.