Given the following table:
CREATE TABLE people (
name TEXT PRIMARY KEY,
age INT NOT NULL
);
INSERT INTO people VALUES
('Lisa', 30),
('Marta', 27),
('John', 32),
('Sam', 41),
('Alex', 12),
('Aristides',43),
('Cindi', 1)
;
I am using a self join to compare each value of a specific column with all the other values of the same column. My query looks something like this:
SELECT DISTINCT A.name as child
FROM people A, people B
WHERE A.age 16 < B.age;
This query aims to spot potential sons/daughters based on age difference. More specifically, my goal is to identify the set of people that may have stayed in the same house as one of their parents (ordered by name), assuming that there must be an age difference of at least 16 years between a child and their parents.
Now I would like to combine this kind of logic with the information that is in another table.
The other table looks something like that:
CREATE TABLE houses (
house_name TEXT NOT NULL,
house_member TEXT NOT NULL REFERENCES people(name)
);
INSERT INTO houses VALUES
('house Smith', 'Lisa'),
('house Smith', 'Marta'),
('house Smith', 'John'),
('house Doe', 'Lisa'),
('house Doe', 'Marta'),
('house Doe', 'Alex'),
('house Doe', 'Sam'),
('house McKenny', 'Aristides'),
('house McKenny', 'John'),
('house McKenny', 'Cindi')
;
The two tables can be joined ON houses.house_member = people.name
.
More specifically I would like to spot the children only within the same house. It does not make sense to compare the age of each person with the age of all the others, but instead it would be more efficient to compare the age of each person with all the other people in the same house.
My idea is to perform the self join from above but only within a PARTITION BY household_name
. However, I don't think this is a good idea since I do not have an aggregate function. Same applies for GROUP BY
statements as well. What could I do here?
The expected output should be the following, ordered by house_member
:
house_member |
---|
Alex |
Cindi |
For simplicity I have created a fiddle.
CodePudding user response:
At first join two tables to build one table that has all three bits of info: house_name
, house_member
, age
.
And then join it with itself just as you did originally and add one extra filter to look only at the same households.
WITH
CTE_All
AS
(
SELECT
houses.house_name
,houses.house_member
,people.age
FROM
houses
INNER JOIN people ON people.name = houses.house_member
)
SELECT DISTINCT
Children.house_name
,Children.house_member AS child_name
FROM
CTE_All AS Children
INNER JOIN CTE_All AS Parents
ON Children.age 16 < Parents.age
-- this is our age difference
AND Children.house_name = Parents.house_name
-- within the same house
;
All this is one single query. You don't have to use CTE, you can inline it as a subquery, but it is more readable with CTE.
Result
house_name | child_name :------------ | :--------- house Doe | Alex house McKenny | Cindi