Is there a regular expression that matches a string of ANY different letters (in bash/sed/python)?-CodePudding

Inspired by today's Advent of Code task I was wondering whether it is possible to use regex to find in a string a substring of a given length that consists of different letters. For example:

Given length 4, the only substring that consists of different letters from: asbshbsbuhb is sbuh. Every other 4 letter substring has duplicate letters.

Is there any way (not over-complicated one) to use regex to find such substrings given either bash or sed or python?

CodePudding user response：

With negative lookaheads, you can, though it's arguably rather tortured.

(.)(?!\1)(.)(?!\1|\2)(.)(?!\1|\2|\3).

Bash and sed lack this facility, but Python supports it.

Demo: https://ideone.com/MlAYC5

(The demo uses different numbering so that the entire match is group 1, to support the conventions of Python's re.findall.)

CodePudding user response：

This is a partial regex solution, that just looks for 4 letter words, and then uses some python code to identify if any of them have a unique sequence of characters.

Code:

import re

text = 'asbshbsbuhb'

m = re.findall(r'(?=([A-Za-z]{4}))', text)

for word in m:
    if len(set(word)) == len(word):
        print(word)

Output:

sbuh