I have a table in SQL which looks like this -
ITEM | ACTIVITY_ID | ACTIVITY_TYPE | ACTIVITY_DATE |
---|---|---|---|
Item 1 | Activity A | Call | Jan - 1 - 2022 |
Item 1 | Activity B | Jan - 10 - 2022 | |
Item 1 | Activity C | Jan - 12 - 2022 |
Similarly, there are thousands of Items and each can have one or more activities (up to 5). I want to run a SQL query to flatten the data at the Item level for all the records and the output desired is something like this -
ITEM | ACTIVITY 1 | ACTIVITY 2 | ACTIVITY 3 | ACTIVITY 4 | ACTIVITY 5 | ACTIVITY 1 DATE | ACTIVITY 2 DATE | ACTIVITY 3 DATE | ACTIVITY 4 DATE | ACTIVITY 5 DATE |
---|---|---|---|---|---|---|---|---|---|---|
Item 1 | Call | Jan - 1 - 2022 | Jan - 10 - 2022 | Jan - 12 - 2022 |
The activity columns (1-5) are populated based on the ascending order of activity date.
Is there a way to achieve this? Also, I can import the raw data in Python and can do the transformation there as well if there's an elegant way to do it using Pandas.
Please note that the column values still remain as column values and it is not identical to the unpivot operation in pandas. I saw the answer on unpivoting in pandas but was not able to solve this particular problem using the answers there
Thanks in advance,
CodePudding user response:
Pattern:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY item ORDER BY activity_date) rn
FROM source_table
)
SELECT t1.item,
t1.activity_type activity_1,
-- ...
t5.activity_type activity_5,
t1.activity_date date_1,
-- ...
t5.activity_date date_5
FROM cte t1
LEFT JOIN cte t2 ON t1.item = t2.item AND t2.rn = 2
LEFT JOIN cte t3 ON t1.item = t3.item AND t3.rn = 3
LEFT JOIN cte t4 ON t1.item = t4.item AND t4.rn = 4
LEFT JOIN cte t5 ON t1.item = t5.item AND t5.rn = 5
WHERE t1.rn = 1
PS. The format of the data in activity_date
column seems to be non-standard, and the convertion to DATE datatype may be required.
CodePudding user response:
You're looking for PIVOT, not UNPIVOT.
But in your case NPath works, too:
SELECT *
FROM NPath
( ON (
SELECT ITEM, ACTIVITY_TYPE, ACTIVITY_DATE
FROM tab
)
PARTITION BY ITEM -- group by column
ORDER BY ACTIVITY_DATE -- order within list
USING
MODE (NonOverlapping) -- required syntax
Symbols (True AS T) -- every row
Pattern ('T*') -- is aggregated
RESULT(First (item OF T) AS item -- group by column
,First (ACTIVITY_TYPE OF T) AS activity_1_type
,NTH (ACTIVITY_TYPE,2 OF T) AS activity_2_type
,NTH (ACTIVITY_TYPE,3 OF T) AS activity_3_type
,NTH (ACTIVITY_TYPE,4 OF T) AS activity_4_type
,NTH (ACTIVITY_TYPE,5 OF T) AS activity_5_type
,First (ACTIVITY_DATE OF T) AS activity_1_date
,NTH (ACTIVITY_DATE,2 OF T) AS activity_2_date
,NTH (ACTIVITY_DATE,3 OF T) AS activity_3_date
,NTH (ACTIVITY_DATE,4 OF T) AS activity_4_date
,NTH (ACTIVITY_DATE,5 OF T) AS activity_5_date
,Count(* OF T)
)
)
;