regex to find the position of the first four concurrent unique values-CodePudding

I've solved 2022 advent of code day 6, but was wondering if there was a regex way to find the first occurance of 4 non-repeating characters:

From the question:

bvwbjplbgvbhsrlpgdmjqwftvncz

bvwbjplbgvbhsrlpgdmjqwftvncz

# discard as repeating letter b

bvwbjplbgvbhsrlpgdmjqwftvncz

# match the 5th character, which signifies the end of the first four character block with no repeating characters

in R I've tried:

txt <- "bvwbjplbgvbhsrlpgdmjqwftvncz"
str_match("(.*)\1", txt)

But I'm having no luck

CodePudding user response：

You can use

stringr::str_extract(txt, "(.)(?!\\1)(.)(?!\\1|\\2)(.)(?!\\1|\\2|\\3)(.)")

See the regex demo. Here, (.) captures any char into consequently numbered groups and the (?!...) negative lookaheads make sure each subsequent . does not match the already captured char(s).

See the R demo:

library(stringr)
txt <- "bvwbjplbgvbhsrlpgdmjqwftvncz"
str_extract(txt, "(.)(?!\\1)(.)(?!\\1|\\2)(.)(?!\\1|\\2|\\3)(.)")
## => [1] "vwbj"

Note that the stringr::str_match (as stringr::str_extract) takes the input as the first argument and the regex as the second argument.