Home > OS >  extract repeated character, but not in words
extract repeated character, but not in words

Time:09-21

SELECT regexp_replace(' aaa bbb 888 bbb ccc ddd fff YESS', '(.)\1 ', '\1', 'g');

Return: a b 8 b c d f YES
Expect: a b 8 b c d f YESS

The pattern : if there are 3 repeated characters (before and after both have more than 0 white spaces), only extract one character.

Related Question: Remove Consecutive Repeated Characters with PostgreSQL regexp_replace function

CodePudding user response:

Try this pattern:

SELECT regexp_replace(' aaa bbb 888 bbb ccc ddd fff YESS', '\m(.)\1 \M', '\1', 'g');
   regexp_replace    
═════════════════════
  a b 8 b c d f YESS
(1 row)

\m matches at a word beginning and \M at a word end.

Note that that will match repetitions like aaa.bbb too, because the period is a word beginning and end. If you want to only match repetitions betwee spaces, as stated in your question, you can use lookahead and lookbehind matches:

SELECT regexp_replace('aaa bbb 888 bbb ccc.ddd fff YESS', '(?<= )(.)\1 (?= )', '\1', 'g');
      regexp_replace      
══════════════════════════
 aaa b 8 b ccc.ddd f YESS
(1 row)
  • Related