I'm cleaning some text directly in my query, and rather than using nested replace functions, I found this bit of code that uses perl to perform multiple replacements at once: multi-replace with perl
CREATE FUNCTION pg_temp.multi_replace(string text, orig text[], repl text[])
RETURNS text
AS $BODY$
my ($string, $orig, $repl) = @_;
my %subs;
if (@$orig != @$repl) {
elog(ERROR, "array sizes mismatch");
}
if (ref @$orig[0] eq 'ARRAY' || ref @$repl[0] eq 'ARRAY') {
elog(ERROR, "array dimensions mismatch");
}
@subs{@$orig} = @$repl;
my $re = join "|",
sort { (length($b) <=> length($a)) } keys %subs;
$re = qr/($re)/;
$string =~ s/$re/$subs{$1}/g;
return $string;
$BODY$ language plperl strict immutable;
Example query:
select
name as original_name,
multi_replace(name, '{-,&,LLC$}', '{_,and,business}') as cleaned_name
from some_table
The function finds the pattern LLC at the end of the name string but removes it instead of replacing it with "business."
How can I make this work as intended?
CodePudding user response:
While the regexp tests for LLC$
with the special meaning of the $
, what gets captured into $1
is just the string LLC
and so it doesn't find the look-up value to replace.
If the only thing you care about is $
, then you could fix it by changing the map-building lines to:
@subs{map {my $t=$_; $t=~s/\$$//; $t} @$orig} = @$repl;
my $re = join "|",
sort { (length($b) <=> length($a)) } @$orig;
But it will be very hard to make it work more generally for every possible feature of regex.
CodePudding user response:
When the strings in @$orig
are to be matched literally, I'd actually use this:
my ($string, $orig, $repl) = @_;
# Argument checks here.
my %subs; @subs{ @$orig } = @$repl;
my $pat =
join "|",
map quotemeta,
sort { length($b) <=> length($a) }
@$orig;
return $string =~ s/$re/$subs{$&}/gr;
In particular, map quotemeta,
was missing.
(By the way, the sort
line isn't needed if you ensure that xy
comes before x
in @$orig
when you want to replace both x(?!y)
and xy
.)
But you want the strings in @$orig
to be treated as regex patterns. For that, you can use the following:
# IMPORTANT! Only provide strings from trusted sources in
# `@$orig` as it allows execution of arbitrary Perl code.
my ($string, $orig, $repl) = @_;
# Argument checks here.
my $re =
join "|",
map "(?:$orig->[$_])(?{ $_ })",
0..$#$orig;
{
use re qw( eval );
$re = qr/$re/;
}
return $string =~ s/$re/$repl->[$^R]/gr;
I'm not sure if use re qw( eval );
and (?{ })
are available to you, so this may be an unviable solution for you.