I want to create a wide table from a complex relational database. I've created a SQL query that generates a table like this:
INCIDENT_ID | INCIDENT_DATE | INCIDENT_POSTCODE | PARTY_ID |
---|---|---|---|
1 | 20/07/18 | NG2 6HF | 1 |
1 | 20/07/18 | NG2 6HF | 2 |
....
Where for every incident_id incident_date & incident_postcode are identical, but because of the fact that multiple parties can be in an incident (up to 4) I have this problem of up to 4 rows per incident_id. Is it possible to combine these rows into 1 row? Something like this:
INCIDENT_ID | INCIDENT_DATE | INCIDENT_POSTCODE | PARTY_ID | PARTY_ID2 |
---|---|---|---|---|
1 | 20/07/18 | NG2 6HF | 1 | 2 |
My SQL query to create the original table is (more complex in real life but this is minimum reproducible code):
select I.INCIDENT_ID,
I.INCIDENT_DATE,
I.INCIDENT_POSTCODE,
PA.PARTY_ID
from INCIDENT I
inner join PARTY PA
on PA.INCIDENT_ID = I.INCIDENT_ID
CodePudding user response:
Here's one approach: aggregate all the party ids into an array and then put each array element into a column. This works if the maximum number of party ids is a known value e.g. 4
CREATE OR REPLACE TEMPORARY TABLE ntest
(
INCIDENT_id int,
incident_date date,
incident_postcode varchar,
party_id int
)
INSERT INTO ntest
VALUES
(1, '2018-07-20', 'NG2 6HF', 1),
(1, '2018-07-20', 'NG2 6HF', 2),
(1, '2018-07-20', 'NG2 6HF', 3),
(2, '2018-07-25', 'ABC DEF', 1)
WITH base AS (
SELECT
INCIDENT_ID
, INCIDENT_DATE
, INCIDENT_POSTCODE
, ARRAY_AGG(PARTY_ID) AS PARTY_ID_ARRAY
FROM ntest
GROUP BY
INCIDENT_ID
, INCIDENT_DATE
, INCIDENT_POSTCODE
)
SELECT
INCIDENT_ID
, INCIDENT_DATE
, INCIDENT_POSTCODE
, PARTY_ID_ARRAY[0] AS PARTY_ID
, PARTY_ID_ARRAY[1] AS PARTY_ID2
, PARTY_ID_ARRAY[2] AS PARTY_ID3
, PARTY_ID_ARRAY[3] AS PARTY_ID4
FROM base
Result:
INCIDENT_ID | INCIDENT_DATE | INCIDENT_POSTCODE | PARTY_ID | PARTY_ID2 | PARTY_ID3 | PARTY_ID4 |
---|---|---|---|---|---|---|
1 | 2018-07-20 | NG2 6HF | 1 | 2 | 3 | |
2 | 2018-07-25 | ABC DEF | 1 |