given an input number, for example 70025
, I need to check with a query if it is contained in this string:
70014-70015;70000;70001;70002-70004
The numbers with -
has to be treated as a range, so if 70025
in 70014-70015
or 70002-70004
, while the ones between or after the ;
as single and specific value, so 70025
is 70000
or 70001
Thanks in advice
CodePudding user response:
As Tim already commented, the data processing is not very easy with such a string. If you can't find another way, try a solution like this (works in postgresql only and expects the data to be in correct format):
with tmp as (
select unnest(string_to_array('70014-70015;70000;70001;70002-70004', ';')) x
)
select
generate_series(
split_part(tmp.x,'-',1) :: bigint,
coalesce(NULLIF(split_part(tmp.x,'-',2), '') ,
split_part(tmp.x,'-',1)):: bigint)
from tmp
CodePudding user response:
I would write a function for this:
create or replace function is_contained(p_value integer, p_list text)
returns boolean
as
$$
select exists (
select *
from (
select string_to_array(x.item, '-')::int[] as elements
from unnest(string_to_array(p_list, ';')) as x(item)
) t
where case
when cardinality(elements) = 1 then elements[1] = p_value
else p_value between elements[1] and elements[2]
end
);
$$
language sql
immutable;
So the main part:
select *
from (
select string_to_array(x.element, '-')::int[] as elements
from unnest(string_to_array('70014-70015;70000;70001;70002-70004', ';')) as x(element)
) t
returns an array with one or two elements. This can then be used to write a CASE expression that checks if the input value matches at least one of the elements, e.g. this:
select elements,
case
when cardinality(elements) = 1 then elements[1] = 70015
else 70015 between elements[1] and elements[2]
end
from (
select string_to_array(x.item, '-')::int[] as elements
from unnest(string_to_array('70014-70015;70000;70001;70002-70004', ';')) as x(item)
) t
returns:
elements | case
-------------- ------
{70014,70015} | true
{70000} | false
{70001} | false
{70002,70004} | false
The CASE expression returns a boolean, so by moving it into the WHERE clause it only returns rows where the comparison value matches (where the CASE yields true).
As the SQL function should only return a single boolean, we use exists
to test if there is at least one row based on the WHERE condition.
You can use it e.g. like this:
select *
from badly_designed_table
where is_contained(70014, the_column);
With Postgres 14 or later, another option is to create a function that converts the string value into a multirange which has the advantage that this can be used to create an index.
create function make_multirange(p_list text)
returns int4multirange
as
$$
select range_agg(int4range(elements[1], coalesce(elements[2], elements[1]), '[]'))
from (
select string_to_array(x.element, '-')::int[] as elements
from unnest(string_to_array(p_list, ';')) as x(element)
) t
$$
language sql
immutable;
Note that the ranges are defined as inclusive on both ends, so 70014-70015
includes both values in the range (which matches the BETWEEN operator I used in the first function). This will fail however if any of the input strings to to follow the format you have shown.
With that function you can query the table like this:
select *
from badly_designed_table
where make_multirange(the_column) @> 70014
The expression can be used to create an index:
create index on badly_designed_table
using gist ((make_multirange(the_column));
The correct solution however, is to change your data model