To SELECT multiple CASE-WHEN expressions into a single row per ID I have used aggregation MAX
and GROUPBY
.
SELECT table1.IDvar,
MAX(CASE WHEN table2.var1 = 'foo' THEN table2.var2 END) AS condition1,
MAX(CASE WHEN table2.var1 = 'bar' THEN table2.var2 END) AS condition2
FROM table1
FULL JOIN table2 ON table1.IDvar = table2.table1_IDvar
GROUP BY table1.IDvar
However, I have observed that empirical criteria such as foo
entered in the CASE-WHEN-THEN-END expression may occur multiple times, that is, in multiple rows each of which corresponds to different values on columns of interest (THEN column-of-interest END
) in the db schema. This implies that taking the MAX
or MIN
drops data that may be of interest. It is not known in advance how many rows there are for each value in criteria_col and thus in the cols_of_interest.
Sample data e.g.:
IDvar_foreign_key | criteria_col | col_of_interest1 | col_of_interest2 |
---|---|---|---|
x1 | foo | 01-01-2021 | 100 |
x1 | foo | 01-06-2021 | 2000 |
x1 | foo | 01-08-2021 | 0 |
x1 | bar | 01-08-2021 | 300 |
Note: the actual table does contain a unique identifier or primary key.
Q: Are there ways to pivot certain columns/tables in a db schema without possibly dropping some values?
An ouput something like this:
IDvar_foreign_key | foo_1_col_of_interest1 | foo_1_col_of_interest2 | foo_2_col_of_interest1 | foo_2_col_of_interest2 | foo_3_col_of_interest1 | foo_3_col_of_interest2 |
---|---|---|---|---|---|---|
x1 | 01-01-2021 | 100 | 01-06-2021 | 2000 | 01-08-2021 | 0 |
Edit @lemon and @MTO suggests dynamic queries are necesarry, otherwise I was considering whether not using aggregation would do
Pivot rows to columns without aggregate
TSQL Pivot without aggregate function
CodePudding user response:
You can use the MIN
and MAX
aggregation functions and to get the correlated minimums and maximums for col_of_interest2
you can use KEEP (DENSE_RANK ...)
:
SELECT t1.IDvar,
MIN(CASE WHEN t2.criteria_col = 'foo' THEN t2.col_of_interest1 END)
AS foo_1_col_of_interest1,
MIN(col_of_interest2) KEEP (
DENSE_RANK FIRST ORDER BY
CASE WHEN t2.criteria_col = 'foo' THEN t2.col_of_interest1 END
ASC NULLS LAST
) AS foo_1_col_of_interest2,
MAX(CASE WHEN t2.criteria_col = 'foo' THEN t2.col_of_interest1 END)
AS foo_2_col_of_interest1,
MAX(col_of_interest2) KEEP (
DENSE_RANK FIRST ORDER BY
CASE WHEN t2.criteria_col = 'foo' THEN t2.col_of_interest1 END
DESC NULLS LAST
) AS foo_2_col_of_interest2
FROM table1 t1
FULL JOIN table2 t2
ON t1.IDvar = t2.table1_IDvar
GROUP BY t1.IDvar
Which, for the sample data:
CREATE TABLE table1 ( idvar ) AS
SELECT 1 FROM DUAL UNION ALL
SELECT 2 FROM DUAL UNION ALL
SELECT 3 FROM DUAL;
CREATE TABLE table2 ( table1_idvar, criteria_col, col_of_interest1, col_of_interest2 ) AS
SELECT 1, 'foo', DATE '2021-01-01', 100 FROM DUAL UNION ALL
SELECT 1, 'foo', DATE '2021-03-01', 500 FROM DUAL UNION ALL
SELECT 1, 'foo', DATE '2021-06-01', 2000 FROM DUAL UNION ALL
SELECT 1, 'bar', DATE '2021-06-01', 2000 FROM DUAL UNION ALL
SELECT 2, 'foo', DATE '2021-01-02', 200 FROM DUAL UNION ALL
SELECT 2, 'foo', DATE '2021-03-02', 300 FROM DUAL UNION ALL
SELECT 2, 'bar', DATE '2021-06-02', 400 FROM DUAL UNION ALL
SELECT 3, 'foo', DATE '2021-01-03', 700 FROM DUAL;
Outputs:
IDVAR | FOO_1_COL_OF_INTEREST1 | FOO_1_COL_OF_INTEREST2 | FOO_2_COL_OF_INTEREST1 | FOO_2_COL_OF_INTEREST2 |
---|---|---|---|---|
1 | 2021-01-01 00:00:00 | 100 | 2021-06-01 00:00:00 | 2000 |
2 | 2021-01-02 00:00:00 | 200 | 2021-03-02 00:00:00 | 300 |
3 | 2021-01-03 00:00:00 | 700 | 2021-01-03 00:00:00 | 700 |
SQL (not just Oracle) requires each query to have a known, fixed number of columns; if you want a dynamic number of columns then you should perform the pivot in whatever third-party application (Java, C#, PHP, etc.) that you are using to talk to the database.
If you want to pivot a fixed maximum number of columns then you can use the ROW_NUMBER
analytic function. For example, if you want the 3 minimum values for col_of_interest1
then you can use:
SELECT idvar,
MAX(CASE WHEN criteria_col = 'foo' AND rn = 1 THEN col_of_interest1 END)
AS foo_1_col_of_interest1,
MAX(CASE WHEN criteria_col = 'foo' AND rn = 1 THEN col_of_interest2 END)
AS foo_1_col_of_interest2,
MAX(CASE WHEN criteria_col = 'foo' AND rn = 2 THEN col_of_interest1 END)
AS foo_2_col_of_interest1,
MAX(CASE WHEN criteria_col = 'foo' AND rn = 2 THEN col_of_interest2 END)
AS foo_2_col_of_interest2,
MAX(CASE WHEN criteria_col = 'foo' AND rn = 3 THEN col_of_interest1 END)
AS foo_3_col_of_interest1,
MAX(CASE WHEN criteria_col = 'foo' AND rn = 3 THEN col_of_interest2 END)
AS foo_3_col_of_interest2
FROM (
SELECT t1.IDvar,
criteria_col,
col_of_interest1,
col_of_interest2,
ROW_NUMBER() OVER (
PARTITION BY t1.IDvar, criteria_col
ORDER BY col_of_interest1, col_of_interest2
) AS rn
FROM table1 t1
FULL JOIN table2 t2
ON t1.IDvar = t2.table1_IDvar
WHERE criteria_col IN ('foo' /*, 'bar', 'etc'*/)
)
GROUP BY idvar
Which outputs:
IDVAR | FOO_1_COL_OF_INTEREST1 | FOO_1_COL_OF_INTEREST2 | FOO_2_COL_OF_INTEREST1 | FOO_2_COL_OF_INTEREST2 | FOO_3_COL_OF_INTEREST1 | FOO_3_COL_OF_INTEREST2 |
---|---|---|---|---|---|---|
1 | 2021-01-01 00:00:00 | 100 | 2021-03-01 00:00:00 | 500 | 2021-06-01 00:00:00 | 2000 |
2 | 2021-01-02 00:00:00 | 200 | 2021-03-02 00:00:00 | 300 | null | null |
3 | 2021-01-03 00:00:00 | 700 | null | null | null | null |
CodePudding user response:
Unknown number of columns is an issue that you can't ignore. The question is - are there any possible expected limits. I question who would get any meaningfull insight from a resulting dataset with hundreds of columns. It doesnt make sense. If you could setup the limit to 10 or 20 or whatever like that then you could build a datagrid structure using pivot where the number of columns would be the same and the data within could be placed as in your question.
Just as an example how - here is the code that does it with up to 6 pairs of your data of interest (COL_DATE and COL_VALUE) - it could be 20 or 30 or ...
First your sample data and some preparation for pivoting (CTE named grid):
WITH -- S a m p l e d a t a
tbl AS
(
SELECT 1 "ID", 'foo' "CRITERIA", DATE '2021-01-01' "INTEREST_1", 100 "INTEREST_2" FROM DUAL UNION ALL
SELECT 1, 'foo', DATE '2021-03-01', 500 FROM DUAL UNION ALL
SELECT 1, 'foo', DATE '2021-06-01', 2000 FROM DUAL UNION ALL
SELECT 1, 'bar', DATE '2021-06-01', 2000 FROM DUAL UNION ALL
SELECT 2, 'foo', DATE '2021-01-02', 200 FROM DUAL UNION ALL
SELECT 2, 'foo', DATE '2021-03-02', 300 FROM DUAL UNION ALL
SELECT 2, 'bar', DATE '2021-06-02', 400 FROM DUAL UNION ALL
SELECT 3, 'foo', DATE '2021-01-03', 700 FROM DUAL
),
grid AS
(SELECT * FROM
( Select ID "ID", CRITERIA "GRP", INTEREST_1 "COL_DATE", INTEREST_2 "COL_VALUE",
Count(*) OVER(Partition By ID, CRITERIA) "ROWS_TOT",
ROW_NUMBER() OVER(Partition By ID, CRITERIA Order By ID, CRITERIA) "RN_GRP_ID",
ROW_NUMBER() OVER(Partition By ID, CRITERIA Order By ID, CRITERIA) "RN_GRP_ID_2"
From tbl t )
ORDER BY ID ASC, GRP DESC, ROWS_TOT DESC
),
Result (grid)
ID GRP COL_DATE COL_VALUE ROWS_TOT RN_GRP_ID RN_GRP_ID_2
---------- --- --------- ---------- ---------- ---------- -----------
1 foo 01-JAN-21 100 3 3 3
1 foo 01-JUN-21 2000 3 2 2
1 foo 01-MAR-21 500 3 1 1
1 bar 01-JUN-21 2000 1 1 1
2 foo 02-MAR-21 300 2 2 2
2 foo 02-JAN-21 200 2 1 1
2 bar 02-JUN-21 400 1 1 1
3 foo 03-JAN-21 700 1 1 1
... next is pivoting (another CTE named grid_pivot) and designing another grid that will be populated with your data of interest...
grid_pivot AS
( SELECT
ID, GRP, ROWS_TOT,
MAX(GRP_1_LINK) "GRP_1_LINK", CAST(Null as DATE) "GRP_1_DATE", CAST(Null as NUMBER) "GRP_1_VALUE",
MAX(GRP_2_LINK) "GRP_2_LINK", CAST(Null as DATE) "GRP_2_DATE", CAST(Null as NUMBER) "GRP_2_VALUE",
MAX(GRP_3_LINK) "GRP_3_LINK", CAST(Null as DATE) "GRP_3_DATE", CAST(Null as NUMBER) "GRP_3_VALUE",
MAX(GRP_4_LINK) "GRP_4_LINK", CAST(Null as DATE) "GRP_4_DATE", CAST(Null as NUMBER) "GRP_4_VALUE",
MAX(GRP_5_LINK) "GRP_5_LINK", CAST(Null as DATE) "GRP_5_DATE", CAST(Null as NUMBER) "GRP_5_VALUE",
MAX(GRP_6_LINK) "GRP_6_LINK", CAST(Null as DATE) "GRP_6_DATE", CAST(Null as NUMBER) "GRP_6_VALUE"
-- ... ... ... ...
FROM
( Select *
From ( Select * From grid )
PIVOT ( Max(RN_GRP_ID) "LINK" --Min(RN_GRP_ID) "GRP_FROM",
FOR RN_GRP_ID_2 IN(1 "GRP_1", 2 "GRP_2", 3 "GRP_3", 4 "GRP_4", 5 "GRP_5", 6 "GRP_6" ) ) -- ... ...
Order By ROWS_TOT DESC, GRP DESC, ID ASC
)
GROUP BY GRP, ROWS_TOT, ID
ORDER BY ROWS_TOT DESC, GRP DESC, ID ASC
)
Result (grid_pivot)
ID GRP ROWS_TOT GRP_1_LINK GRP_1_DATE GRP_1_VALUE GRP_2_LINK GRP_2_DATE GRP_2_VALUE GRP_3_LINK GRP_3_DATE GRP_3_VALUE GRP_4_LINK GRP_4_DATE GRP_4_VALUE GRP_5_LINK GRP_5_DATE GRP_5_VALUE GRP_6_LINK GRP_6_DATE GRP_6_VALUE
---------- --- ---------- ---------- ---------- ----------- ---------- ---------- ----------- ---------- ---------- ----------- ---------- ---------- ----------- ---------- ---------- ----------- ---------- ---------- -----------
1 foo 3 1 2 3
2 foo 2 1 2
3 foo 1 1
1 bar 1 1
2 bar 1 1
... and, finaly, mixing grid_pivot data with grid data using 6 left joins to fit 6 pairs of your data of interest into the grid.
SELECT gp.ID, gp.GRP,
g1.COL_DATE "GRP_1_DATE", g1.COL_VALUE "GRP_1_VALUE",
g2.COL_DATE "GRP_2_DATE", g2.COL_VALUE "GRP_2_VALUE",
g3.COL_DATE "GRP_3_DATE", g3.COL_VALUE "GRP_3_VALUE",
g4.COL_DATE "GRP_1_DATE", g4.COL_VALUE "GRP_4_VALUE",
g5.COL_DATE "GRP_2_DATE", g5.COL_VALUE "GRP_5_VALUE",
g6.COL_DATE "GRP_3_DATE", g6.COL_VALUE "GRP_6_VALUE"
-- ... ... ... ...
FROM grid_pivot gp
LEFT JOIN grid g1 ON(g1.ID = gp.ID And g1.GRP = gp.GRP And g1.RN_GRP_ID = gp.GRP_1_LINK)
LEFT JOIN grid g2 ON(g2.ID = gp.ID And g2.GRP = gp.GRP And g2.RN_GRP_ID = gp.GRP_2_LINK)
LEFT JOIN grid g3 ON(g3.ID = gp.ID And g3.GRP = gp.GRP And g3.RN_GRP_ID = gp.GRP_3_LINK)
LEFT JOIN grid g4 ON(g4.ID = gp.ID And g4.GRP = gp.GRP And g4.RN_GRP_ID = gp.GRP_4_LINK)
LEFT JOIN grid g5 ON(g5.ID = gp.ID And g5.GRP = gp.GRP And g5.RN_GRP_ID = gp.GRP_5_LINK)
LEFT JOIN grid g6 ON(g6.ID = gp.ID And g6.GRP = gp.GRP And g6.RN_GRP_ID = gp.GRP_6_LINK)
-- ... ... ... ...
ORDER BY gp.ROWS_TOT DESC, gp.GRP DESC, gp.ID ASC
R e s u l t :
ID GRP GRP_1_DATE GRP_1_VALUE GRP_2_DATE GRP_2_VALUE GRP_3_DATE GRP_3_VALUE GRP_1_DATE GRP_4_VALUE GRP_2_DATE GRP_5_VALUE GRP_3_DATE GRP_6_VALUE
---------- --- ---------- ----------- ---------- ----------- ---------- ----------- ---------- ----------- ---------- ----------- ---------- -----------
1 foo 01-MAR-21 500 01-JUN-21 2000 01-JAN-21 100
2 foo 02-JAN-21 200 02-MAR-21 300
3 foo 03-JAN-21 700
1 bar 01-JUN-21 2000
2 bar 02-JUN-21 400
Anyway you will probably need dynamic solution so, this could be interesting for something else, who knows what, when and where...