Home > Software design >  How do I replace a certain char in between 2 strings using regex
How do I replace a certain char in between 2 strings using regex

Time:08-22

I'm new to regex and have been trying to work this out on my own but I don't seem to get it working. I have an input that contains start and end flags and I want to replace a certain char, but only if it's between the flags.

So for example if the start flag is START and the end flag is END and the char i'm trying to replace is " and I would be replacing it with \"

I would say input.replaceAll(regex, '\\\"');

I tried making a regex to only match the correct " chars but so far I have only been able to get it to match all chars between the flags and not just the " chars. -> (?<=START)(.*)(?=END)

Example input:

This " is START an " example input END string ""
START This is a "" second example END
This" is "a START third example END " "

Expected output:

This " is START an \" example input END string ""
START This is a \"\" second example END
This" is "a START third example END " "

CodePudding user response:

Find all characters between START and END, and for those characters replace " with \".

To achieve this, apply a replacer function to all matches of characters between START and END:

string = Pattern.compile("(?<=START).*?(?=END)").matcher(string)
    .replaceAll(mr -> mr.group().replace("\"", "\\\\\""));

which produces your expected output.

Some notes on how this works.

This first step is to match all characters between START and END, which uses look arounds with a reluctant quantifier:

(?<=START).*?(?=END)

The ? after the .* changes the match from greedy (as many chars as possible while still matching) to reluctant (as few chars as possible while still matching). This prevents the middle quote in the following input from being altered:

START a"b END c"d START e"f END

A greedy quantifier will match from the first START all the way past the next END to the last END, incorrectly including c"d.

The next step is for each match to replace " with \". The full match is group 0, or just MatchResult#group. and we don't need regex for this replacement - just plain string replace is enough (and yes, replace() replaces all occurrences).

CodePudding user response:

For now i've been able to solve it by creating 3 capture groups and continuously replacing the match until there are no more matches left. In this case I even had to insert a replace indentifier because replacing with " would keep the " char there and create an infinite loop. Then when there are no more matches left I replaced my identifier and i'm now getting the expected result.

I still feel like there has to be a way cleaner way to do this using only 1 replace statement...

Code that worked for me:

class Playground {
    public static void main(String[ ] args) {
        String input = "\"ThSTARTis is a\" te\"\"stEND \" !!!";

        String regex = "(.*START. )\" (.*END .*)";

        while(input.matches(regex)){
            input = input.replaceAll(regex, "$1---replace---$2");
        }

        String result = input.replace("---replace---", "\\\"");

        System.out.println(result);
    }
}

Output:

"ThSTARTis is a\" te\"\"stEND " !!!

I would love any suggestions as to how I could solve this in a better/cleaner way.

CodePudding user response:

Another option is to make use of the \G anchor with 2 capture groups. In the replacement use the 2 capture groups followed by \"

(?:(START)(?=.*END)|\G(?!^))((?:(?!START|END)(?>\\ \"|[^\r\n\"]))*)\"

Explanation

  • (?: Non capture group
    • (START)(?=.*END) Capture group 1, match START and assert there is END to the right
    • | Or
    • \G(?!^) Assert the current position at the end of the previous match
  • ) Close non capture group
  • ( Capture group 2
    • (?: Non capture group
      • (?!START|END) Negative lookhead, assert not START or END directly to the right
      • (?>\\ \"|[^\r\n\"]) Match 1 times \ followed by " or match any char except " or a newline
    • )* Close the non capture group and optionally repeat it
  • ) Close group 2
  • \" Match "

See a Java regex demo and a Java demo

For example:

String regex = "(?:(START)(?=.*END)|\\G(?!^))((?:(?!START|END)(?>\\\\ \\\"|[^\\r\\n\\\"]))*)\\\"";
String string = "This \" is START an \" example input END string \"\"\n"
  "START This is a \"\" second example END\n"
  "This\" is \"a START third example END \" \"";
String subst = "$1$2\\\\\"";

Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);

String result = matcher.replaceAll(subst);

System.out.println(result);

Output

This " is START an \" example input END string ""
START This is a \"\" second example END
This" is "a START third example END " "
  • Related