How to use a LISTAGG in a pivot in Snowflake?-CodePudding

Have a table (database.schema.table1) with the following data:

transaction_id |  state | transaction_type | date_timestamp
        1      |   CA   |     Payment      |   12/1/2022 01:00:00
        1      |   CA   |     Payment      |   12/1/2022 02:00:00
        1      |   MA   |     Payment      |   12/1/2022 01:00:00
        2      |   MA   |     Refund       |   12/1/2022 01:00:00
        3      |   NY   |     Payment      |   12/1/2022 01:00:00
        4      |   MA   |     Payment      |   12/1/2022 03:00:00

I want my result set to look kind of like this:

   transaction_id | transaction_type | CA                                     | NY                 | MA
         1        |   Payment        | 12/1/2022 01:00:00, 12/1/2022 02:00:00 |                    | 12/1/2022 01:00:00
         2        |   Refund         |                                        | 12/1/2022 01:00:00 | 
         3        |   Payment        |                                        | 12/1/2022 01:00:00 |
         4        |   Payment        | 12/1/2022 03:00                        |                    |

I have tried the following query but it doesn't seem to work in Snowflake for some reason (and have tried some variations of what I tried googling online):

   select *
    from database.schema.table1 t1
   pivot (listagg(t1.time, '|') for t1.state in ('CA', 'MA', 'NY')) as p;

Is there any way I can try to use a listagg in this pivot? Thanks in advance!

CodePudding user response：

Try with an explicit pivot operation:

SELECT transaction_id, 
       transaction_type,
       LISTAGG(CASE WHEN state = 'CA' THEN date_timestamp END, ', ') AS CA,
       LISTAGG(CASE WHEN state = 'NY' THEn date_timestamp END, ', ') AS NY,
       LISTAGG(CASE WHEN state = 'MA' THEn date_timestamp END, ', ') AS MA
FROM tab
GROUP BY transaction_id,
         transaction_type

CodePudding user response：

One option is to run the listagg() before the pivot - then the pivot will run as desired:

with data as (
    select transaction_id, state, transaction_type, date_timestamp
    from (
        select split(value, '|') x, trim(x[0])  transaction_id, trim(x[1]) state, trim(x[2]) transaction_type, x[3]::string::timestamp date_timestamp
        from table(split_to_table(
        $$      1      |   CA   |     Payment      |   12/1/2022 01:00:00
                1      |   CA   |     Payment      |   12/1/2022 02:00:00
                1      |   MA   |     Payment      |   12/1/2022 01:00:00
                2      |   MA   |     Refund       |   12/1/2022 01:00:00
                3      |   NY   |     Payment      |   12/1/2022 01:00:00
                4      |   MA   |     Payment      |   12/1/2022 03:00:00$$, '\n'))
    )
)

select *
from (
    select transaction_id, transaction_type, state, listagg(date_timestamp, ', ') arr
    from data
    group by 1, 2, 3
) t1
pivot(min(arr) for t1.state in ('CA', 'MA', 'NY')) as p
order by 1;

;

CodePudding user response：

An alternative approach using the cool function

SELECT   
  TRANSACTION_ID
, TRANSACTION_TYPE
, ARRAY_TO_STRING("'CA'",',') CA
, ARRAY_TO_STRING("'MA'",',') MA
, ARRAY_TO_STRING("'NY'",',') NY
FROM   
    T1
PIVOT (ARRAY_AGG(T1.DATE_TIMESTAMP) FOR T1.STATE IN ('CA','MA','NY'));

where T1 was lifted from Felipe's excellent answer.

with t1 as (
select transaction_id, state, transaction_type, date_timestamp
from (
    select split(value, '|') x, trim(x[0])  transaction_id, trim(x[1]) state, trim(x[2]) transaction_type, x[3]::string::timestamp date_timestamp
    from table(split_to_table(
    $$      1      |   CA   |     Payment      |   12/1/2022 01:00:00
            1      |   CA   |     Payment      |   12/1/2022 02:00:00
            1      |   MA   |     Payment      |   12/1/2022 01:00:00
            2      |   MA   |     Refund       |   12/1/2022 01:00:00
            3      |   NY   |     Payment      |   12/1/2022 01:00:00
            4      |   MA   |     Payment      |   12/1/2022 03:00:00$$, '\n'))
)
)