I have a problem with doing a subquery in PostgreSQL to get a calculated column value, it reports:
[21000] ERROR: more than one row returned by a subquery used as an expression
Situation
I have two tables:
- accounts
- postal_code_to_state
Accounts table (subset of columns)
name | postal_code | state |
---|---|---|
Cust One | 00020 | NULL |
Cust Two | 63076 | CD |
Postal Code to State
pc_from | pc_to | state |
---|---|---|
10 | 30 | AB |
63000 | 63100 | CD |
The accounts
table has rows where the state value may be unknown, but there is a postal code value. The postal code field is char (but that is incidental).
The postal_code_to_state
table has rows with postcode (integer) from & to columns. That is the low/high numbers of the postal code range for a state.
There is no common field to do joins. To get the state from the postal_code_to_state
table the char field is cast to INT and the between
operator used, e.g.
SELECT state
FROM postal_code_to_state
WHERE CAST('00020' AS INT) BETWEEN pc_from AND pc_to
this works OK, there is also a unique index on pc_from
and pc_to
.
But I need to run a query selecting from the accounts table and populating the state column from the state column in the postal_code_to_state table using the postal_code from the accounts table to select the appropriate row.
I can't figure out why PostgreSQL is complaining about the subquery returning multiple rows. This is the query I am currently using:
SELECT id,
name,
postal_code,
state,
(SELECT state
FROM postal_code_to_state
WHERE CAST(accounts.postal_code AS INT) BETWEEN pc_from AND pc_to) AS new_state
FROM accounts
WHERE postal_code IS NOT NULL ;
If I use LIMIT 1
in the subquery it is OK, and it returns the correct state value from postal_code_to_state, but would like to have it working without need to do that.
CodePudding user response:
Try this query to find duplicates:
select
a_tbl.state, a_tbl.pc_from, a_tbl.pc_to
from
postal_code_to_state as a_tbl,
(select * from postal_code_to_state) as b_tbl
where
a_tbl.state != b_tbl.state
and
int4range(a_tbl.pc_from, a_tbl.pc_to, '[]') && int4range(b_tbl.pc_from, b_tbl.pc_to, '[]') != 'empty';
If there are duplicates, after clearing them then you can do:
alter table
postal_code_to_state
add
constraint exclude_test EXCLUDE USING GIST (int4range(pc_from, pc_to, '[]') WITH &&);
This will set up an exclusion constraint to prevent overlapping ranges.
So:
insert into postal_code_to_state values (10, 30, 'AB'), (6300, 63100, 'CD');
insert into postal_code_to_state values (25, 40, 'SK');
ERROR: conflicting key value violates exclusion constraint "exclude_test"
insert into postal_code_to_state values (31, 40, 'SK');
INSERT 0 1
select * from postal_code_to_state ;
pc_from | pc_to | state
--------- ------- -------
10 | 30 | AB
63000 | 63100 | CD
31 | 40 | SK
CodePudding user response:
The combined unique index would not protect you against overlapping postal codes, only duplicates. First, I'd write the query like this
SELECT id, name, postcode, coalesce(accounts.state, postal_code_to_state.state) state
FROM accounts
LEFT JOIN postal_code_to_state ON accounts.postal_code::Integer BETWEEN pc_from AND pc_to
WHERE accounts.state IS NOT NULL OR postal_code_to_state.state IS NOT NULL;
You could modify it to tell you which are overlapping
SELECT id, coalesce(accounts.state, postal_code_to_state.state) state
FROM accounts
LEFT JOIN postal_code_to_state ON accounts.postal_code::Integer BETWEEN pc_from AND pc_to
WHERE accounts.state IS NOT NULL OR postal_code_to_state.state IS NOT NULL
GROUP BY id,state
HAVING count(id) > 1;
I haven't tested any of this.