Home > Software design >  Postgres GROUP BY an array column
Postgres GROUP BY an array column

Time:08-12

I have a list of students and parents and would like to group them into families using the student id's. Parents who share common student id's can be considered to be a family while also students who share common parent id's can be considered to be a family. This is a sample table:

 p_id | parent_name | s_id | student_name |
------------------------------------------|
 1    | John Doe    | 100 |  Mike Doe     |
 3    | Jane Doe    | 100 |  Mike Doe     |
 3    | Jane Doe    | 105 |  Lisa Doe     |
 5    | Will Willy  | 108 |  William Son  |

I'd like to end up with something like:

     parents       |       students         |
-------------------|------------------------|
John Doe, Jane Doe |  Mike Doe, Lisa Doe    |
Will Willy         |  William Son           |

To achieve this I'm currently using:

SELECT array_agg(parents) AS parents FROM (
    SELECT array_agg(p_id) AS par_ids, array_agg(parent_name) AS parents, student_name, s_id 
    FROM (
        /* sub query */
    )b
    GROUP BY s_id, student_name
    ORDER BY parents ASC
)c
GROUP BY unnest(par_ids)
ORDER BY parents ASC

But I get an error: ERROR: cannot accumulate arrays of different dimensionality. SQL state: 2202E

How can I attain the desired results? The inner query from the above statement returns:

| par_ids |       parents       |  student_name   | s_id |
--------------------------------|------------------------|
| {1,3}   | {John Doe, Jane Doe}|  Mike Doe       | 100  |
| {3}     | {Jane Doe}          |  Lisa Doe       | 105  |
| {5}     | {Will Willy}        |  William Son    | 108  |

Grouping these students now to the parents is where I'm stuck.

CodePudding user response:

I did something similar (but a bit more complex) already here: https://stackoverflow.com/a/53129510/3984221


step-by-step demo:db<>fiddle

SELECT
    array_agg(parent_name) as parents,                -- 4
    array_agg(student_name) as students
FROM (
    SELECT DISTINCT ON (t.s_id)                       -- 3
        *
    FROM (
        SELECT
            s_id,
            array_agg(p_id) as parents                -- 1
        FROM mytable
        GROUP BY s_id
    ) s JOIN mytable t ON t.p_id = ANY(s.parents)     -- 2
    ORDER BY t.s_id, CARDINALITY(parents) DESC        -- 3
) s
GROUP BY parents
  1. Aggregate the p_id values into an array:

    s_id parents
    108 {5}
    105 {3}
    100 {1,3}
  2. Self-join the original table on this array:

    s_id parents p_id parent_name s_id student_name
    100 {1,3} 1 John Doe 100 Mike Doe
    105 {3} 3 Jane Doe 100 Mike Doe
    100 {1,3} 3 Jane Doe 100 Mike Doe
    105 {3} 3 Jane Doe 105 Lisa Doe
    100 {1,3} 3 Jane Doe 105 Lisa Doe
    108 {5} 5 Will Willy 108 William Son
  3. Remove all duplicate student records. The remaining ones should be the records with the most complete p_id array. This can be done using DISTINCT ON(s_id) on a descending order by the array length:

    s_id parents p_id parent_name s_id student_name
    100 {1,3} 1 John Doe 100 Mike Doe
    100 {1,3} 3 Jane Doe 105 Lisa Doe
    108 {5} 5 Will Willy 108 William Son
  4. Finally you can group by the p_id array and aggregate the two name columns:

    parents students
    {"John Doe","Jane Doe"} {"Mike Doe","Lisa Doe"}
    {"Will Willy"} {"William Son"}

If you don't want to get an array, but a string list, you can use string_agg(name_colum, ',') instead of array_agg(name_column)

  • Related