Home > Back-end >  multi replace in postgresql query using perl
multi replace in postgresql query using perl

Time:05-11

I'm cleaning some text directly in my query, and rather than using nested replace functions, I found this bit of code that uses perl to perform multiple replacements at once: multi-replace with perl

CREATE FUNCTION pg_temp.multi_replace(string text, orig text[], repl text[]) 
RETURNS text 
AS $BODY$ 
  my ($string, $orig, $repl) = @_;
  my %subs;

  if (@$orig != @$repl) {
     elog(ERROR, "array sizes mismatch");
  } 
  if (ref @$orig[0] eq 'ARRAY' || ref @$repl[0] eq 'ARRAY') {
     elog(ERROR, "array dimensions mismatch");
  } 

  @subs{@$orig} = @$repl;
  
  my $re = join "|",
     sort { (length($b) <=> length($a)) } keys %subs;
  $re = qr/($re)/;

  $string =~ s/$re/$subs{$1}/g;

  return $string;
$BODY$ language plperl strict immutable;

Example query:

select

name as original_name, 
multi_replace(name, '{-,&,LLC$}', '{_,and,business}') as cleaned_name

from some_table

The function finds the pattern LLC at the end of the name string but removes it instead of replacing it with "business."

How can I make this work as intended?

CodePudding user response:

While the regexp tests for LLC$ with the special meaning of the $, what gets captured into $1 is just the string LLC and so it doesn't find the look-up value to replace.

If the only thing you care about is $, then you could fix it by changing the map-building lines to:

 @subs{map {my $t=$_; $t=~s/\$$//; $t} @$orig} = @$repl;

 my $re = join "|",
    sort { (length($b) <=> length($a)) } @$orig;

But it will be very hard to make it work more generally for every possible feature of regex.

CodePudding user response:

When the strings in @$orig are to be matched literally, I'd actually use this:

my ($string, $orig, $repl) = @_;

# Argument checks here.

my %subs; @subs{ @$orig } = @$repl;

my $pat =
   join "|",
      map quotemeta,
         sort { length($b) <=> length($a) }
            @$orig;

return $string =~ s/$re/$subs{$&}/gr;

In particular, map quotemeta, was missing.

(By the way, the sort line isn't needed if you ensure that xy comes before x in @$orig when you want to replace both x(?!y) and xy.)


But you want the strings in @$orig to be treated as regex patterns. For that, you can use the following:

# IMPORTANT! Only provide strings from trusted sources in
# `@$orig` as it allows execution of arbitrary Perl code.

my ($string, $orig, $repl) = @_;

# Argument checks here.

my $re =
   join "|",
      map "(?:$orig->[$_])(?{ $_ })",
         0..$#$orig;

{
   use re qw( eval );
   $re = qr/$re/;
}

return $string =~ s/$re/$repl->[$^R]/gr;

I'm not sure if use re qw( eval ); and (?{ }) are available to you, so this may be an unviable solution for you.

  • Related