I want to replace "&" with a random word "$d" in a given sentence. Can we replace only those words which start with & and are followed by a single character and a space?
Example:-
Input:-
Two literals are &a and &b and also check &abc and &bac here.
Output:-
Two literals are $da and $db and also check &abc and &bac here.
In the above example in input, the only words that should be replaced are &a and &b(not the complete word should be replaced, only just the '&' in both the words) because these two random words start with & and are followed by a single character and a space.
In the case of the replaceAll() function, it replaces the entire word when I used regex:-
String str="Two literals are &a and &b and also check &abc and &bac here.";
str = str.replaceAll("\\&[a-zA-Z]{1}\\s", "\\$d");
System.out.println(str);
//output for this:-Two literals are $d and $d and also check &abc and &bac here.
//expected output:-Two literals are $da and $db and also check &abc and &bac here.
CodePudding user response:
The correct code for this would be
str.replaceAll("&([a-zA-Z]\\s)", "\\$d$1")
This is an example of backreferencing captured groups in regex, and a here is a nice reference for it. Additionally, here's a relevant StackOverflow question about it.
Essentially, the match inside the parentheses ([a-zA-Z]\\s
) matches a single letter and a space. The value of this match can be referenced with $1
since it is of capturing group 1.
So we replace &(a )
with $d(a )
(brackets here to demonstrate what is captured). Credit to u/rzwitserloot for reminding me that OP wants $ not &.
CodePudding user response:
You presumably want a concept called look-ahead: You can match on things being there without 'consuming' it. You can even match on things NOT being there. That's what you want here: Match &[a-z]
, but only if looking ahead past that, we do NOT see another letter:
for (String test : List.of("Two literals are &a and &bcd", "A literal is &a", "How about &a?")) {
System.out.println(str.replaceAll("&(?=[a-zA-Z](?![a-zA-Z]))", "\\$d"));
}
Perhaps instead you want the single letter thing to just be on any word break (i.e. &z00
should NOT turn into $dz00
, even though there is no letter after the z
. Then I suggest:
"&(?=[a-zA-Z]\\b)"
That's a lot simpler to read!
A few notes:
(?=x)
is 'positive lookahead'. It doesn't itself match anything but makes the match fail ifx
is not immediately following the match.(?!x)
is 'negative lookahead'. It doesn't itself match anything but makes the match fail ifx
is immediately following the match.$
has special meaning in the replacement part so we need to escape it.\\b
is regexpese for 'word break': Doesn't match any characters, but fails if we aren't on a 'word break'. Spaces, dots, end-of-input, end-of-line, a dash, an ampersand - many things are word breaks.- We don't want to match those letters because if we do, they would be replaced.