Inspired by today's Advent of Code task I was wondering whether it is possible to use regex to find in a string a substring of a given length that consists of different letters. For example:
Given length 4, the only substring that consists of different letters from:
asbshbsbuhb
is sbuh
. Every other 4 letter substring has duplicate letters.
Is there any way (not over-complicated one) to use regex to find such substrings given either bash or sed or python?
CodePudding user response:
With negative lookaheads, you can, though it's arguably rather tortured.
(.)(?!\1)(.)(?!\1|\2)(.)(?!\1|\2|\3).
Bash and sed
lack this facility, but Python supports it.
Demo: https://ideone.com/MlAYC5
(The demo uses different numbering so that the entire match is group 1, to support the conventions of Python's re.findall
.)
CodePudding user response:
This is a partial regex solution, that just looks for 4 letter words, and then uses some python code to identify if any of them have a unique sequence of characters.
Code:
import re
text = 'asbshbsbuhb'
m = re.findall(r'(?=([A-Za-z]{4}))', text)
for word in m:
if len(set(word)) == len(word):
print(word)
Output:
sbuh