Subquery returning field from second table with between operator using value from first table-CodePudding

I have a problem with doing a subquery in PostgreSQL to get a calculated column value, it reports:

[21000] ERROR: more than one row returned by a subquery used as an expression

Situation

I have two tables:

accounts
postal_code_to_state

Accounts table (subset of columns)

name	postal_code	state
Cust One	00020	NULL
Cust Two	63076	CD

Postal Code to State

pc_from	pc_to	state
10	30	AB
63000	63100	CD

The accounts table has rows where the state value may be unknown, but there is a postal code value. The postal code field is char (but that is incidental).

The postal_code_to_state table has rows with postcode (integer) from & to columns. That is the low/high numbers of the postal code range for a state.

There is no common field to do joins. To get the state from the postal_code_to_state table the char field is cast to INT and the between operator used, e.g.

SELECT state
FROM postal_code_to_state
WHERE CAST('00020' AS INT) BETWEEN pc_from AND pc_to

this works OK, there is also a unique index on pc_from and pc_to.

But I need to run a query selecting from the accounts table and populating the state column from the state column in the postal_code_to_state table using the postal_code from the accounts table to select the appropriate row.

I can't figure out why PostgreSQL is complaining about the subquery returning multiple rows. This is the query I am currently using:

SELECT id,
       name,
       postal_code,
       state,
       (SELECT state
        FROM postal_code_to_state
        WHERE CAST(accounts.postal_code AS INT) BETWEEN pc_from AND pc_to) AS new_state
FROM accounts
WHERE postal_code IS NOT NULL ;

If I use LIMIT 1 in the subquery it is OK, and it returns the correct state value from postal_code_to_state, but would like to have it working without need to do that.

CodePudding user response：

Try this query to find duplicates:

select 
    a_tbl.state, a_tbl.pc_from, a_tbl.pc_to 
from 
    postal_code_to_state as a_tbl, 
   (select *  from postal_code_to_state) as b_tbl 
where 
    a_tbl.state != b_tbl.state 
and 
   int4range(a_tbl.pc_from, a_tbl.pc_to, '[]') && int4range(b_tbl.pc_from, b_tbl.pc_to, '[]') != 'empty';

If there are duplicates, after clearing them then you can do:

alter table 
    postal_code_to_state 
add 
   constraint exclude_test EXCLUDE USING GIST  (int4range(pc_from, pc_to, '[]') WITH &&);

This will set up an exclusion constraint to prevent overlapping ranges.

So:

insert into postal_code_to_state values (10, 30, 'AB'), (6300, 63100, 'CD');
insert into postal_code_to_state values (25, 40, 'SK');
ERROR:  conflicting key value violates exclusion constraint "exclude_test"

insert into postal_code_to_state values (31, 40, 'SK');
INSERT 0 1

select * from postal_code_to_state ;
 pc_from | pc_to | state 
--------- ------- -------
      10 |    30 | AB
   63000 | 63100 | CD
      31 |    40 | SK

CodePudding user response：

The combined unique index would not protect you against overlapping postal codes, only duplicates. First, I'd write the query like this

SELECT id, name, postcode, coalesce(accounts.state, postal_code_to_state.state) state
FROM accounts
LEFT JOIN postal_code_to_state ON accounts.postal_code::Integer BETWEEN pc_from AND pc_to
WHERE accounts.state IS NOT NULL OR postal_code_to_state.state IS NOT NULL;

You could modify it to tell you which are overlapping

SELECT id, coalesce(accounts.state, postal_code_to_state.state) state
FROM accounts
LEFT JOIN postal_code_to_state ON accounts.postal_code::Integer BETWEEN pc_from AND pc_to
WHERE accounts.state IS NOT NULL OR postal_code_to_state.state IS NOT NULL
GROUP BY id,state
HAVING count(id) > 1;

I haven't tested any of this.