Home > database >  Regex for finding only single alphabets in a string and ignore consecutive double
Regex for finding only single alphabets in a string and ignore consecutive double

Time:08-13

I have searched a lot but I am unable to find a regex that could select only single alphabets and double them while those alphabets which are already double, should remain untouched.

I tried

String str = "yahoo";
str = str.replaceAll("(\\w)\\1 ", "$0$0");

But since this (\\w)\\1 selects all double elements, my output becomes yahoooo. I tried to add negation to it !(\\w)\\1 but didn't work and output becomes same as input. I have tried

str.replaceAll(".", "$0$0");

But that doubles every character including which are already doubled.

Please help to write an regex that could replace all single character with double while double character should remain untouched.

Example

abc -> aabbcc
yahoo -> yyaahhoo (o should remain untouched)
opinion -> ooppiinniioonn
aaaaaabc -> aaaaaabbcc

CodePudding user response:

You can match using this regex:

(?:((.)\2)|(.))

And replace it with:

$1$3$3

RegEx Demo

RegEx Explanation:

  • (?:: Start a non-capture group
    • ((.)\2): Match a character and capture in group #2 and using \1 next to it to make sure we have a repeat. Capture pair of same characters in group #1
    • |: OR
    • (.): Match a character and capture in group #3
  • ): End non-capture group

Code Demo:

import java.util.List;
 
class Ideone {
 
    public static void main(String[] args) {
        List<String> input = List.of("abc", "yahoo",
                "opinion", "aaaaaabc");
 
        for (String s: input) {
            System.out.println( s   " => "  
                  s.replaceAll("(?:((.)\\2)|(.))", "$1$3$3") );
        }
    }
}

Output:

abc => aabbcc
yahoo => yyaahhoo
opinion => ooppiinniioonn
aaaaaabc => aaaaaabbcc

CodePudding user response:

The solution by @anubhava, if viable in Java, is probably the best way to go. For a more brute force approach, we can try a regex iteration approach on the following pattern:

(\\w)\\1 |\\w

This matches, eagerly, a series of similar letters (two or more of them), followed by, that failing, a single letter. For each match, we can no-op on the multi-letter match, and double up any other single letter. Here is a short Java code which does this:

List<String> inputs = Arrays.asList(new String[] {"abc", "yahoo", "opinion", "aaaaaabc"});
String pattern = "(\\w)\\1 |\\w";
Pattern r = Pattern.compile(pattern);

for (String input : inputs) {
    Matcher m = r.matcher(input);
    StringBuffer buffer = new StringBuffer();
    while (m.find()) {
        if (m.group().matches("(\\w)\\1 ")) {
            m.appendReplacement(buffer, m.group());
            }
            else {
                m.appendReplacement(buffer, m.group()   m.group());
            }
        }
        m.appendTail(buffer);
        System.out.println(input   " => "   buffer.toString());
    }
}

This prints:

abc => aabbcc
yahoo => yyaahhoo
opinion => ooppiinniioonn
aaaaaabc => aaaaaabbcc

CodePudding user response:

I've got two different understandings of the question.

  1. If the goal is to get an even amount of each word character:
    Search for (\w)\1? and replace with $1$1 (regex101 demo).

  2. If just solely characters should be duplicated and others left untouched:
    Search for (\w)\1?(\1*) and replace with $1$1$2 (regex 101 demo).

Captures a word character \w to $1, optionally matches the same character again. The second variant captures any more of the same character to $2 for attaching in the replacement.

FYI: If using as a Java string remember to escape the pattern. E.g. \1 -> \\1, \w ->\\w, ...

  • Related