I have raw data as below in source table. i need this transformed as shown below.
user_id | domain_id | id_value | id_status |
---|---|---|---|
48085640 | ID1 | 21885688845 | 5 |
48085640 | ID1 | 20544518912 | 5 |
48085640 | ID2 | 176652329 | 5 |
48085640 | ID2 | 121702229 | 5 |
48085640 | ID3 | 111844976 | 5 |
48085640 | ID3 | 111347117 | 5 |
48085640 | ID4 | 1234567 | 5 |
And i am trying to achieve the output as below. There are two values for ID1, ID2, ID3. if i use pivot i must use an aggregate function like max() or min() i am losing one of the rows.also i need to fetch the ID1, ID2 and ID3 only and filter rest of the IDs. Please help me achieve this in oracle sql and also in impala if possible
user_id | ID1 | ID2 | ID3 | id_status |
---|---|---|---|---|
48085640 | 218856888 | 1766523293 | 111844976 | 5 |
48085640 | 205445189 | 1217022297 | 111347117 | 5 |
CodePudding user response:
You can always manualy pivot this data and then join it to itsefl for each IDs.
Let's try:
CREATE TABLE ABC_123 (
user_id NUMBER, domain_id varchar2(3), id_value NUMBER, id_status number );
begin
INSERT INTO ABC_123 values (48085640, 'ID1', 21885688845, 5);
INSERT INTO ABC_123 values (48085640, 'ID1', 20544518912 ,5);
INSERT INTO ABC_123 values (48085640, 'ID2', 176652329 ,5);
INSERT INTO ABC_123 values (48085640, 'ID2', 121702229 ,5);
INSERT INTO ABC_123 values (48085640, 'ID3', 111844976 ,5);
INSERT INTO ABC_123 values (48085640, 'ID3', 111347117 ,5);
INSERT INTO ABC_123 values (48085640, 'ID4', 1234567 ,5);
END;
COMMIT;
WITH pivot_data as
( SELECT user_id
, CASE WHEN domain_id = 'ID1' THEN id_value ELSE NULL END AS ID1
, CASE WHEN domain_id = 'ID2' THEN id_value ELSE NULL END AS ID2
, CASE WHEN domain_id = 'ID3' THEN id_value ELSE NULL END AS ID3
, row_number() over(PARTITION BY domain_id ORDER BY 1 /* order by is not important, but is mandatory*/ ) AS rownum_for_domain_id
, id_status
FROM ABC_123 )
SELECT d1.user_id, d1.ID1, d2.ID2, d3.ID3, d1.id_status
FROM pivot_data d1 /* inner for domain_id = 'ID1' - it determines number of rows */
LEFT JOIN pivot_data d2
ON d1.user_id = d2.user_id
AND d2.ID2 IS NOT NULL
AND d2.rownum_for_domain_id = d1.rownum_for_domain_id
LEFT JOIN pivot_data d3
ON d1.user_id = d3.user_id
AND d3.ID3 IS NOT NULL
AND d3.rownum_for_domain_id = d1.rownum_for_domain_id
WHERE d1.ID1 IS NOT null ;
USER_ID | ID1 | ID2 | ID3 | ID_STATUS |
---|---|---|---|---|
48085640 | 21885688845 | 176652329 | 111844976 | 5 |
48085640 | 20544518912 | 121702229 | 111347117 | 5 |
CodePudding user response:
Use the ROW_NUMBER
analytic function to give each row a number based on the order within each user_id
/domain_id
partition and then use PIVOT
:
SELECT user_id,
id1,
id2,
id3,
id_status
FROM (
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY user_id, domain_id ORDER BY ROWNUM)
AS rn
FROM table_name t
)
PIVOT (
MAX(id_value)
FOR domain_id IN ('ID1' AS id1, 'ID2' AS id2, 'ID3' AS id3)
)
Which, for the sample data:
CREATE TABLE table_name (user_id, domain_id, id_value, id_status) AS
SELECT 48085640, 'ID1', 21885688845, 5 FROM DUAL UNION ALL
SELECT 48085640, 'ID1', 20544518912, 5 FROM DUAL UNION ALL
SELECT 48085640, 'ID2', 176652329, 5 FROM DUAL UNION ALL
SELECT 48085640, 'ID2', 121702229, 5 FROM DUAL UNION ALL
SELECT 48085640, 'ID3', 111844976, 5 FROM DUAL UNION ALL
SELECT 48085640, 'ID3', 111347117, 5 FROM DUAL UNION ALL
SELECT 48085640, 'ID4', 1234567, 5 FROM DUAL;
Outputs:
USER_ID ID1 ID2 ID3 ID_STATUS 48085640 21885688845 176652329 111844976 5 48085640 20544518912 121702229 111347117 5
db<>fiddle here