I am looking for an efficient way to query a postgreSQL database by removing the the right-most character in a string until a match is found. For example, if my dialing number is 442079285200
then it should strip characters from the end of the sequence, eventually matching to UNITED KINGDOM-LONDON 44207
.
442079285200
-> No match44207928520
-> No match4420792852
-> No match442079285
-> No match44207928
-> No match4420792
-> No match442079
-> No match44207
-> Matches UNITED KINGDOM-LONDON
v_destination_rates
destination | dialing_code | current_rate | rounding_rule |
---|---|---|---|
INMARSAT | 870 | 10.8239 | 1-1-1 |
INTERNATIONAL NETWORKS | 882 | 10.8239 | 1-1-1 |
INTERNATIONAL NETWORKS | 883 | 10.8239 | 1-1-1 |
IRIDIUM | 521844207 | 5.1167 | 1-1-1 |
UNITED KINGDOM-LONDON | 44207 | 0.0056 | 1-1-1 |
I know one way of doing this is to loop over the number of characters in the dialing number (n) and do a select query for the left-most n characters. I haven't successfully ran my query, but I believe it would look something similar to:
$do$
DECLARE
m varchar := '442079285200';
BEGIN
FOR counter IN LENGTH(m)..1 loop
select destination from v_destination_rates where v_destination_rates.dialing_code = left(dialing_number, counter);
END LOOP;
END
$do$
I'm wondering if there is a more efficient way of performing this query, perhaps with the LIKE wildcard operator? We have thousands of dialing numbers to match to approximately 20 000 dialing_codes so a less expensive operation would be preferred.
CodePudding user response:
You haven't said whether dialing_number
is coming from a table / sanitized user input / something else.
For simplicity I'll assume it comes from a table contacts
and that you want to return everything in contacts
and every column in v_destination_rates
joined as you describe.
Without using pl/pgSQL:
SELECT
*
FROM contacts c
LEFT JOIN v_destination_rates vdr
ON vdr.dialing_code::TEXT LIKE c.dialing_number::TEXT || '%'
I've tested this on a table of 9,000 records, which I assume is as about as large or larger than the lookup table v_destination_rates
, and matched 16 sample integers in less than a tenth of a second.
You could possibly get even better performance if the dialing code is already type TEXT
, and indexed lexicographically since that's how you're searching here.
CodePudding user response:
I generally avoid regular expressions and like/similar searches whenever possible; they are slow. In this case they are completely avoidable instead you can use substring
and length
and do a equal match. (yea, 2 subroutines vs regex it is a toss up). The following does just that and when multiple matches occur it selects the longest match. (see demo)
with dialing_number (dn) as
( values ('442079285200') )
select dr.*
from dialing_number
join v_destination_rates dr
on dr.dialing_code = substring(dn,1, length(dr.dialing_code))
order by length(dr.dialing_code) desc
limit 1 ;
Concerning performance, a search set of 20000 is a very small number of entries to search. For curiosity I generated random dialing_code
values until the above query took more than 1sec. That occurred at 4755006 rows searched in 1.07 sec.
CodePudding user response:
I'm wondering if there is a more efficient way of performing this query
Yes, parse the number to get the country and dialing code. There are any number of existing libraries to do this. Then concatenate them and search.
For example, 442079285200
is the country code 44 and the dialing code 20 (207 is obsolete). Then you'd search for '4420'.
Note: 870, 882, and 883 are not dialing codes, they are country codes. And Iridium is 881. Mixing up country codes with dialing codes will probably cause more problems down the road, you may be better off separating them in your table.