I am using Postgres 13 and have the following tables (tables column are truncated for clarity) essentially a join table (location) between person and address with extra columns and it's own primary key (this I cannot change).
Person
table
person_id
-----------
1
2
Location
table
location_id | status | address_id | person_id
------------- -------- ------------ -----------
1 | 1 | 1 | 1
2 | 1 | 2 | 1
3 | 1 | 3 | 2
4 | 1 | 4 | 2
Address
table
address_id | postcode
------------ ----------
1 | NR1
2 | NR2
3 | NR3
4 | NR3
What I would like to achieve is a query that will fetch all person ids that have addresses with more than one postcode and a location status of 1. The person table has over 10 million records and each person can have multiple addresses so I would like to use limit and offset so I can get the data via multiple queries.
The result of the sample data in the tables would just return person_id 1 as it has two different postcodes (NR1 & NR2), whereas person_id 2 has two identical postocdes (NR3 & NR3)
My first attempt without paging is this which seems to be working, but I am sure there are better ways of doing it
SELECT sq.person_id
FROM
(
SELECT
p.person_id,
COUNT(DISTINCT a.postcode) AS count_postcode
FROM person p
INNER JOIN location l
ON p.person_id = l.person_id
INNER JOIN address a
ON a.address_id = l.address_id
WHERE l.status = 1
GROUP BY p.person_id
) sq
WHERE sq.count_postcode > 1;
CodePudding user response:
You can join your tables and do a distinct count on the Postcode column, grouped per person:
select l.person_id
from location l
join address a on a.address_id = l.address_Id
group by l.person_id
having Count(distinct a.postcode) > 1;