I'm using the sd
tool which uses rust regular expressions, and I am trying to use it with a named capture group, however if in the replacement the named capture group precedes a string, then the string is included as the potential the for named capture group, causing unexpected behaviour.
Here is a contrived example to illustrate the issue:
echo 'abc' | sd -p '(?P<cg>b)' '$cgB'
# outputs: ac
# desired output: abBc
echo 'abc' | sd -p '(?P<cg>b)' '$cg B'
# outputs as expected: ab Bc
# however, places a space there
I've tried $<cg>B
, $cg(B)
, $cg\0B
, all don't give abBc
.
I've also checked the rust regex docs however the x flag, and other techniques seem only applicable to the search pattern, not the replace pattern.
CodePudding user response:
We don't need the sd
tool to reproduce this behavior. Here it is in pure Rust:
let re = regex::Regex::new(r"(?P<n>b)").unwrap();
let before = "abc";
assert_eq!(re.replace_all(before, "$nB"), "ac");
assert_eq!(re.replace_all(before, "${n}B"), "abBc");
The brace replacement syntax isn't described in the front documentation but on the documentation of the replace
method:
The longest possible name is used. e.g., $1a looks up the capture group named 1a and not the capture group at index 1. To exert more precise control over the name, use braces, e.g., ${1}a.
In short, unless there's a character that can't be part of a name just after in the replacement pattern, you should always put the group name between braces.
CodePudding user response:
For the rust regexes via sd via bash use case in the question:
echo 'abc' | sd -p '(?P<cg>b)' '${cg}B'