I have a list of students and parents and would like to group them into families using the student id's. Parents who share common student id's can be considered to be a family while also students who share common parent id's can be considered to be a family. This is a sample table:
p_id | parent_name | s_id | student_name |
------------------------------------------|
1 | John Doe | 100 | Mike Doe |
3 | Jane Doe | 100 | Mike Doe |
3 | Jane Doe | 105 | Lisa Doe |
5 | Will Willy | 108 | William Son |
I'd like to end up with something like:
parents | students |
-------------------|------------------------|
John Doe, Jane Doe | Mike Doe, Lisa Doe |
Will Willy | William Son |
To achieve this I'm currently using:
SELECT array_agg(parents) AS parents FROM (
SELECT array_agg(p_id) AS par_ids, array_agg(parent_name) AS parents, student_name, s_id
FROM (
/* sub query */
)b
GROUP BY s_id, student_name
ORDER BY parents ASC
)c
GROUP BY unnest(par_ids)
ORDER BY parents ASC
But I get an error: ERROR: cannot accumulate arrays of different dimensionality
. SQL state: 2202E
How can I attain the desired results? The inner query from the above statement returns:
| par_ids | parents | student_name | s_id |
--------------------------------|------------------------|
| {1,3} | {John Doe, Jane Doe}| Mike Doe | 100 |
| {3} | {Jane Doe} | Lisa Doe | 105 |
| {5} | {Will Willy} | William Son | 108 |
Grouping these students now to the parents is where I'm stuck.
CodePudding user response:
I did something similar (but a bit more complex) already here: https://stackoverflow.com/a/53129510/3984221
SELECT
array_agg(parent_name) as parents, -- 4
array_agg(student_name) as students
FROM (
SELECT DISTINCT ON (t.s_id) -- 3
*
FROM (
SELECT
s_id,
array_agg(p_id) as parents -- 1
FROM mytable
GROUP BY s_id
) s JOIN mytable t ON t.p_id = ANY(s.parents) -- 2
ORDER BY t.s_id, CARDINALITY(parents) DESC -- 3
) s
GROUP BY parents
Aggregate the
p_id
values into an array:s_id parents 108 {5} 105 {3} 100 {1,3} Self-join the original table on this array:
s_id parents p_id parent_name s_id student_name 100 {1,3} 1 John Doe 100 Mike Doe 105 {3} 3 Jane Doe 100 Mike Doe 100 {1,3} 3 Jane Doe 100 Mike Doe 105 {3} 3 Jane Doe 105 Lisa Doe 100 {1,3} 3 Jane Doe 105 Lisa Doe 108 {5} 5 Will Willy 108 William Son Remove all duplicate student records. The remaining ones should be the records with the most complete
p_id
array. This can be done usingDISTINCT ON(s_id)
on a descending order by the array length:s_id parents p_id parent_name s_id student_name 100 {1,3} 1 John Doe 100 Mike Doe 100 {1,3} 3 Jane Doe 105 Lisa Doe 108 {5} 5 Will Willy 108 William Son Finally you can group by the
p_id
array and aggregate the twoname
columns:parents students {"John Doe","Jane Doe"} {"Mike Doe","Lisa Doe"} {"Will Willy"} {"William Son"}
If you don't want to get an array, but a string list, you can use string_agg(name_colum, ',')
instead of array_agg(name_column)