Home > Software design >  Is there a way to flatten values in a table using rank function?
Is there a way to flatten values in a table using rank function?

Time:05-13

I have a table in SQL which looks like this -

ITEM ACTIVITY_ID ACTIVITY_TYPE ACTIVITY_DATE
Item 1 Activity A Call Jan - 1 - 2022
Item 1 Activity B Mail Jan - 10 - 2022
Item 1 Activity C Print Jan - 12 - 2022

Similarly, there are thousands of Items and each can have one or more activities (up to 5). I want to run a SQL query to flatten the data at the Item level for all the records and the output desired is something like this -

ITEM ACTIVITY 1 ACTIVITY 2 ACTIVITY 3 ACTIVITY 4 ACTIVITY 5 ACTIVITY 1 DATE ACTIVITY 2 DATE ACTIVITY 3 DATE ACTIVITY 4 DATE ACTIVITY 5 DATE
Item 1 Call Mail Print Jan - 1 - 2022 Jan - 10 - 2022 Jan - 12 - 2022

The activity columns (1-5) are populated based on the ascending order of activity date.

Is there a way to achieve this? Also, I can import the raw data in Python and can do the transformation there as well if there's an elegant way to do it using Pandas.

Please note that the column values still remain as column values and it is not identical to the unpivot operation in pandas. I saw the answer on unpivoting in pandas but was not able to solve this particular problem using the answers there

Thanks in advance,

CodePudding user response:

Pattern:

WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY item ORDER BY activity_date) rn
    FROM source_table
)
SELECT t1.item,
       t1.activity_type activity_1,
--     ...
       t5.activity_type activity_5,
       t1.activity_date date_1,
--     ...
       t5.activity_date date_5
FROM cte t1
LEFT JOIN cte t2 ON t1.item = t2.item AND t2.rn = 2
LEFT JOIN cte t3 ON t1.item = t3.item AND t3.rn = 3
LEFT JOIN cte t4 ON t1.item = t4.item AND t4.rn = 4
LEFT JOIN cte t5 ON t1.item = t5.item AND t5.rn = 5
WHERE t1.rn = 1

PS. The format of the data in activity_date column seems to be non-standard, and the convertion to DATE datatype may be required.

CodePudding user response:

You're looking for PIVOT, not UNPIVOT.

But in your case NPath works, too:

SELECT * 
FROM NPath
       ( ON (
               SELECT ITEM, ACTIVITY_TYPE, ACTIVITY_DATE
               FROM tab
             )
         PARTITION BY ITEM                   -- group by column
         ORDER BY ACTIVITY_DATE              -- order within list
         USING                                      
           MODE (NonOverlapping)             -- required syntax 
           Symbols (True AS T)               -- every row
           Pattern ('T*')                    --   is aggregated
           RESULT(First (item OF T) AS item  -- group by column
                 ,First (ACTIVITY_TYPE OF T) AS activity_1_type
                 ,NTH (ACTIVITY_TYPE,2 OF T) AS activity_2_type
                 ,NTH (ACTIVITY_TYPE,3 OF T) AS activity_3_type
                 ,NTH (ACTIVITY_TYPE,4 OF T) AS activity_4_type
                 ,NTH (ACTIVITY_TYPE,5 OF T) AS activity_5_type
                 ,First (ACTIVITY_DATE OF T) AS activity_1_date
                 ,NTH (ACTIVITY_DATE,2 OF T) AS activity_2_date
                 ,NTH (ACTIVITY_DATE,3 OF T) AS activity_3_date
                 ,NTH (ACTIVITY_DATE,4 OF T) AS activity_4_date
                 ,NTH (ACTIVITY_DATE,5 OF T) AS activity_5_date
                 ,Count(* OF T)
                 )
        )
;
  • Related