postgreSQL: Remove last character in a VARCHAR until match is found-CodePudding

I am looking for an efficient way to query a postgreSQL database by removing the the right-most character in a string until a match is found. For example, if my dialing number is 442079285200 then it should strip characters from the end of the sequence, eventually matching to UNITED KINGDOM-LONDON 44207.

442079285200 -> No match
44207928520 -> No match
4420792852 -> No match
442079285 -> No match
44207928 -> No match
4420792 -> No match
442079 -> No match
44207 -> Matches UNITED KINGDOM-LONDON

v_destination_rates

destination	dialing_code	current_rate	rounding_rule
INMARSAT	870	10.8239	1-1-1
INTERNATIONAL NETWORKS	882	10.8239	1-1-1
INTERNATIONAL NETWORKS	883	10.8239	1-1-1
IRIDIUM	521844207	5.1167	1-1-1
UNITED KINGDOM-LONDON	44207	0.0056	1-1-1

I know one way of doing this is to loop over the number of characters in the dialing number (n) and do a select query for the left-most n characters. I haven't successfully ran my query, but I believe it would look something similar to:

$do$
DECLARE
   m   varchar := '442079285200';
BEGIN
   FOR counter IN LENGTH(m)..1 loop
      select destination from v_destination_rates where v_destination_rates.dialing_code = left(dialing_number, counter);
   END LOOP;
END
$do$

I'm wondering if there is a more efficient way of performing this query, perhaps with the LIKE wildcard operator? We have thousands of dialing numbers to match to approximately 20 000 dialing_codes so a less expensive operation would be preferred.

CodePudding user response：

You haven't said whether dialing_number is coming from a table / sanitized user input / something else.

For simplicity I'll assume it comes from a table contacts and that you want to return everything in contacts and every column in v_destination_rates joined as you describe.

Without using pl/pgSQL:

SELECT
  *
FROM contacts c
LEFT JOIN v_destination_rates vdr
  ON vdr.dialing_code::TEXT LIKE c.dialing_number::TEXT || '%'

I've tested this on a table of 9,000 records, which I assume is as about as large or larger than the lookup table v_destination_rates, and matched 16 sample integers in less than a tenth of a second.

You could possibly get even better performance if the dialing code is already type TEXT, and indexed lexicographically since that's how you're searching here.

CodePudding user response：

I generally avoid regular expressions and like/similar searches whenever possible; they are slow. In this case they are completely avoidable instead you can use substring and length and do a equal match. (yea, 2 subroutines vs regex it is a toss up). The following does just that and when multiple matches occur it selects the longest match. (see demo)

with dialing_number (dn) as 
     ( values ('442079285200') ) 
select dr.* 
  from dialing_number
  join v_destination_rates dr
    on dr.dialing_code = substring(dn,1, length(dr.dialing_code)) 
 order by length(dr.dialing_code) desc
 limit 1 ;

Concerning performance, a search set of 20000 is a very small number of entries to search. For curiosity I generated random dialing_code values until the above query took more than 1sec. That occurred at 4755006 rows searched in 1.07 sec.

CodePudding user response：

I'm wondering if there is a more efficient way of performing this query

Yes, parse the number to get the country and dialing code. There are any number of existing libraries to do this. Then concatenate them and search.

For example, 442079285200 is the country code 44 and the dialing code 20 (207 is obsolete). Then you'd search for '4420'.

Note: 870, 882, and 883 are not dialing codes, they are country codes. And Iridium is 881. Mixing up country codes with dialing codes will probably cause more problems down the road, you may be better off separating them in your table.