sql pivot multiple and partially similar row values into multiple unknown number of cols-CodePudding

To SELECT multiple CASE-WHEN expressions into a single row per ID I have used aggregation MAX and GROUPBY.

SELECT table1.IDvar,
       MAX(CASE WHEN table2.var1 = 'foo' THEN table2.var2 END) AS condition1,
       MAX(CASE WHEN table2.var1 = 'bar' THEN table2.var2 END) AS condition2
FROM table1
FULL JOIN table2 ON table1.IDvar = table2.table1_IDvar
GROUP BY table1.IDvar

However, I have observed that empirical criteria such as foo entered in the CASE-WHEN-THEN-END expression may occur multiple times, that is, in multiple rows each of which corresponds to different values on columns of interest (THEN column-of-interest END) in the db schema. This implies that taking the MAX or MIN drops data that may be of interest. It is not known in advance how many rows there are for each value in criteria_col and thus in the cols_of_interest.

Sample data e.g.:

IDvar_foreign_key	criteria_col	col_of_interest1	col_of_interest2
x1	foo	01-01-2021	100
x1	foo	01-06-2021	2000
x1	foo	01-08-2021	0
x1	bar	01-08-2021	300

Note: the actual table does contain a unique identifier or primary key.

Q: Are there ways to pivot certain columns/tables in a db schema without possibly dropping some values?

An ouput something like this:

IDvar_foreign_key	foo_1_col_of_interest1	foo_1_col_of_interest2	foo_2_col_of_interest1	foo_2_col_of_interest2	foo_3_col_of_interest1	foo_3_col_of_interest2
x1	01-01-2021	100	01-06-2021	2000	01-08-2021	0

Edit @lemon and @MTO suggests dynamic queries are necesarry, otherwise I was considering whether not using aggregation would do

Dynamic Pivot in Oracle's SQL

Pivot rows to columns without aggregate

TSQL Pivot without aggregate function

CodePudding user response：

You can use the MIN and MAX aggregation functions and to get the correlated minimums and maximums for col_of_interest2 you can use KEEP (DENSE_RANK ...):

SELECT t1.IDvar,
       MIN(CASE WHEN t2.criteria_col = 'foo' THEN t2.col_of_interest1 END)
         AS foo_1_col_of_interest1,
       MIN(col_of_interest2) KEEP (
         DENSE_RANK FIRST ORDER BY
         CASE WHEN t2.criteria_col = 'foo' THEN t2.col_of_interest1 END
         ASC NULLS LAST
       ) AS foo_1_col_of_interest2,
       MAX(CASE WHEN t2.criteria_col = 'foo' THEN t2.col_of_interest1 END)
         AS foo_2_col_of_interest1,
       MAX(col_of_interest2) KEEP (
         DENSE_RANK FIRST ORDER BY
         CASE WHEN t2.criteria_col = 'foo' THEN t2.col_of_interest1 END
         DESC NULLS LAST
       ) AS foo_2_col_of_interest2
FROM   table1 t1
       FULL JOIN table2 t2
       ON t1.IDvar = t2.table1_IDvar
GROUP BY t1.IDvar

Which, for the sample data:

CREATE TABLE table1 ( idvar ) AS
SELECT 1 FROM DUAL UNION ALL
SELECT 2 FROM DUAL UNION ALL
SELECT 3 FROM DUAL;

CREATE TABLE table2 ( table1_idvar, criteria_col, col_of_interest1, col_of_interest2 ) AS
SELECT 1, 'foo', DATE '2021-01-01',  100 FROM DUAL UNION ALL
SELECT 1, 'foo', DATE '2021-03-01',  500 FROM DUAL UNION ALL
SELECT 1, 'foo', DATE '2021-06-01', 2000 FROM DUAL UNION ALL
SELECT 1, 'bar', DATE '2021-06-01', 2000 FROM DUAL UNION ALL
SELECT 2, 'foo', DATE '2021-01-02',  200 FROM DUAL UNION ALL
SELECT 2, 'foo', DATE '2021-03-02',  300 FROM DUAL UNION ALL
SELECT 2, 'bar', DATE '2021-06-02',  400 FROM DUAL UNION ALL
SELECT 3, 'foo', DATE '2021-01-03',  700 FROM DUAL;

Outputs:

IDVAR	FOO_1_COL_OF_INTEREST1	FOO_1_COL_OF_INTEREST2	FOO_2_COL_OF_INTEREST1	FOO_2_COL_OF_INTEREST2
1	2021-01-01 00:00:00	100	2021-06-01 00:00:00	2000
2	2021-01-02 00:00:00	200	2021-03-02 00:00:00	300
3	2021-01-03 00:00:00	700	2021-01-03 00:00:00	700

SQL (not just Oracle) requires each query to have a known, fixed number of columns; if you want a dynamic number of columns then you should perform the pivot in whatever third-party application (Java, C#, PHP, etc.) that you are using to talk to the database.

If you want to pivot a fixed maximum number of columns then you can use the ROW_NUMBER analytic function. For example, if you want the 3 minimum values for col_of_interest1 then you can use:

SELECT idvar,
       MAX(CASE WHEN criteria_col = 'foo' AND rn = 1 THEN col_of_interest1 END)
         AS foo_1_col_of_interest1,
       MAX(CASE WHEN criteria_col = 'foo' AND rn = 1 THEN col_of_interest2 END)
         AS foo_1_col_of_interest2,
       MAX(CASE WHEN criteria_col = 'foo' AND rn = 2 THEN col_of_interest1 END)
         AS foo_2_col_of_interest1,
       MAX(CASE WHEN criteria_col = 'foo' AND rn = 2 THEN col_of_interest2 END)
         AS foo_2_col_of_interest2,
       MAX(CASE WHEN criteria_col = 'foo' AND rn = 3 THEN col_of_interest1 END)
         AS foo_3_col_of_interest1,
       MAX(CASE WHEN criteria_col = 'foo' AND rn = 3 THEN col_of_interest2 END)
         AS foo_3_col_of_interest2
FROM   (
  SELECT t1.IDvar,
         criteria_col,
         col_of_interest1,
         col_of_interest2,
         ROW_NUMBER() OVER (
           PARTITION BY t1.IDvar, criteria_col
           ORDER BY col_of_interest1, col_of_interest2
         ) AS rn
  FROM   table1 t1
         FULL JOIN table2 t2
         ON t1.IDvar = t2.table1_IDvar
  WHERE  criteria_col IN ('foo' /*, 'bar', 'etc'*/)
)
GROUP BY idvar

Which outputs:

IDVAR	FOO_1_COL_OF_INTEREST1	FOO_1_COL_OF_INTEREST2	FOO_2_COL_OF_INTEREST1	FOO_2_COL_OF_INTEREST2	FOO_3_COL_OF_INTEREST1	FOO_3_COL_OF_INTEREST2
1	2021-01-01 00:00:00	100	2021-03-01 00:00:00	500	2021-06-01 00:00:00	2000
2	2021-01-02 00:00:00	200	2021-03-02 00:00:00	300	null	null
3	2021-01-03 00:00:00	700	null	null	null	null

fiddle

CodePudding user response：

Unknown number of columns is an issue that you can't ignore. The question is - are there any possible expected limits. I question who would get any meaningfull insight from a resulting dataset with hundreds of columns. It doesnt make sense. If you could setup the limit to 10 or 20 or whatever like that then you could build a datagrid structure using pivot where the number of columns would be the same and the data within could be placed as in your question.
Just as an example how - here is the code that does it with up to 6 pairs of your data of interest (COL_DATE and COL_VALUE) - it could be 20 or 30 or ...

First your sample data and some preparation for pivoting (CTE named grid):

WITH    -- S a m p l e    d a t a
    tbl AS
        (
            SELECT 1 "ID", 'foo' "CRITERIA", DATE '2021-01-01' "INTEREST_1",  100 "INTEREST_2" FROM DUAL UNION ALL
            SELECT 1, 'foo', DATE '2021-03-01',  500 FROM DUAL UNION ALL
            SELECT 1, 'foo', DATE '2021-06-01', 2000 FROM DUAL UNION ALL
            SELECT 1, 'bar', DATE '2021-06-01', 2000 FROM DUAL UNION ALL
            SELECT 2, 'foo', DATE '2021-01-02',  200 FROM DUAL UNION ALL
            SELECT 2, 'foo', DATE '2021-03-02',  300 FROM DUAL UNION ALL
            SELECT 2, 'bar', DATE '2021-06-02',  400 FROM DUAL UNION ALL
            SELECT 3, 'foo', DATE '2021-01-03',  700 FROM DUAL 
        ),
    grid AS
        (SELECT * FROM
            (  Select    ID "ID",  CRITERIA "GRP", INTEREST_1 "COL_DATE", INTEREST_2 "COL_VALUE", 
                        Count(*) OVER(Partition By ID, CRITERIA) "ROWS_TOT", 
                        ROW_NUMBER() OVER(Partition By ID, CRITERIA Order By ID, CRITERIA) "RN_GRP_ID",
                        ROW_NUMBER() OVER(Partition By ID, CRITERIA Order By ID, CRITERIA) "RN_GRP_ID_2"
              From      tbl t )
         ORDER BY  ID ASC, GRP DESC, ROWS_TOT DESC
        ),
        
Result (grid)
        ID GRP COL_DATE   COL_VALUE   ROWS_TOT  RN_GRP_ID RN_GRP_ID_2
---------- --- --------- ---------- ---------- ---------- -----------
         1 foo 01-JAN-21        100          3          3           3 
         1 foo 01-JUN-21       2000          3          2           2 
         1 foo 01-MAR-21        500          3          1           1 
         1 bar 01-JUN-21       2000          1          1           1 
         2 foo 02-MAR-21        300          2          2           2 
         2 foo 02-JAN-21        200          2          1           1 
         2 bar 02-JUN-21        400          1          1           1 
         3 foo 03-JAN-21        700          1          1           1

... next is pivoting (another CTE named grid_pivot) and designing another grid that will be populated with your data of interest...

    grid_pivot AS
        (   SELECT
                ID, GRP, ROWS_TOT, 
                MAX(GRP_1_LINK) "GRP_1_LINK", CAST(Null as DATE) "GRP_1_DATE", CAST(Null as NUMBER) "GRP_1_VALUE",
                MAX(GRP_2_LINK) "GRP_2_LINK", CAST(Null as DATE) "GRP_2_DATE", CAST(Null as NUMBER) "GRP_2_VALUE",
                MAX(GRP_3_LINK) "GRP_3_LINK", CAST(Null as DATE) "GRP_3_DATE", CAST(Null as NUMBER) "GRP_3_VALUE",
                MAX(GRP_4_LINK) "GRP_4_LINK", CAST(Null as DATE) "GRP_4_DATE", CAST(Null as NUMBER) "GRP_4_VALUE",
                MAX(GRP_5_LINK) "GRP_5_LINK", CAST(Null as DATE) "GRP_5_DATE", CAST(Null as NUMBER) "GRP_5_VALUE",
                MAX(GRP_6_LINK) "GRP_6_LINK", CAST(Null as DATE) "GRP_6_DATE", CAST(Null as NUMBER) "GRP_6_VALUE"
              --  ...  ... ... ...
            FROM
                (   Select  * 
                      From    ( Select * From grid )
                          PIVOT ( Max(RN_GRP_ID) "LINK"  --Min(RN_GRP_ID) "GRP_FROM", 
                                  FOR RN_GRP_ID_2 IN(1 "GRP_1", 2 "GRP_2", 3 "GRP_3", 4 "GRP_4", 5 "GRP_5", 6 "GRP_6" ) ) -- ...  ...
                      Order By ROWS_TOT DESC, GRP DESC, ID ASC
                )
            GROUP BY GRP, ROWS_TOT, ID
            ORDER BY ROWS_TOT DESC, GRP DESC, ID ASC
        )

Result (grid_pivot)
        ID GRP   ROWS_TOT GRP_1_LINK GRP_1_DATE GRP_1_VALUE GRP_2_LINK GRP_2_DATE GRP_2_VALUE GRP_3_LINK GRP_3_DATE GRP_3_VALUE GRP_4_LINK GRP_4_DATE GRP_4_VALUE GRP_5_LINK GRP_5_DATE GRP_5_VALUE GRP_6_LINK GRP_6_DATE GRP_6_VALUE
---------- --- ---------- ---------- ---------- ----------- ---------- ---------- ----------- ---------- ---------- ----------- ---------- ---------- ----------- ---------- ---------- ----------- ---------- ---------- -----------
         1 foo          3          1                                 2                                 3                                                                                                                              
         2 foo          2          1                                 2                                                                                                                                                                
         3 foo          1          1                                                                                                                                                                                                  
         1 bar          1          1                                                                                                                                                                                                  
         2 bar          1          1

... and, finaly, mixing grid_pivot data with grid data using 6 left joins to fit 6 pairs of your data of interest into the grid.

SELECT    gp.ID, gp.GRP, 
          g1.COL_DATE "GRP_1_DATE", g1.COL_VALUE "GRP_1_VALUE",
          g2.COL_DATE "GRP_2_DATE", g2.COL_VALUE "GRP_2_VALUE",
          g3.COL_DATE "GRP_3_DATE", g3.COL_VALUE "GRP_3_VALUE",
          g4.COL_DATE "GRP_1_DATE", g4.COL_VALUE "GRP_4_VALUE",
          g5.COL_DATE "GRP_2_DATE", g5.COL_VALUE "GRP_5_VALUE",
          g6.COL_DATE "GRP_3_DATE", g6.COL_VALUE "GRP_6_VALUE"
        -- ...  ... ... ...
FROM      grid_pivot gp
LEFT JOIN grid g1 ON(g1.ID = gp.ID And g1.GRP = gp.GRP And g1.RN_GRP_ID = gp.GRP_1_LINK)
LEFT JOIN grid g2 ON(g2.ID = gp.ID And g2.GRP = gp.GRP And g2.RN_GRP_ID = gp.GRP_2_LINK)
LEFT JOIN grid g3 ON(g3.ID = gp.ID And g3.GRP = gp.GRP And g3.RN_GRP_ID = gp.GRP_3_LINK)
LEFT JOIN grid g4 ON(g4.ID = gp.ID And g4.GRP = gp.GRP And g4.RN_GRP_ID = gp.GRP_4_LINK)
LEFT JOIN grid g5 ON(g5.ID = gp.ID And g5.GRP = gp.GRP And g5.RN_GRP_ID = gp.GRP_5_LINK)
LEFT JOIN grid g6 ON(g6.ID = gp.ID And g6.GRP = gp.GRP And g6.RN_GRP_ID = gp.GRP_6_LINK)
--  ... ... ... ...
ORDER BY  gp.ROWS_TOT DESC, gp.GRP DESC, gp.ID ASC

R e s u l t :
        ID GRP GRP_1_DATE GRP_1_VALUE GRP_2_DATE GRP_2_VALUE GRP_3_DATE GRP_3_VALUE GRP_1_DATE GRP_4_VALUE GRP_2_DATE GRP_5_VALUE GRP_3_DATE GRP_6_VALUE
---------- --- ---------- ----------- ---------- ----------- ---------- ----------- ---------- ----------- ---------- ----------- ---------- -----------
         1 foo 01-MAR-21          500 01-JUN-21         2000 01-JAN-21          100                                                                      
         2 foo 02-JAN-21          200 02-MAR-21          300                                                                                             
         3 foo 03-JAN-21          700                                                                                                                    
         1 bar 01-JUN-21         2000                                                                                                                    
         2 bar 02-JUN-21          400

Anyway you will probably need dynamic solution so, this could be interesting for something else, who knows what, when and where...