I have an interesting issue where I need to create a unique identifier based on match groups for a set of data. This is is based on multiple criteria, but generally what I need to happen is to take this input:
SOURCE_ID | MATCH_ID | PHONE |
---|---|---|
1 | 1 | (999)9999999 |
1 | 2 | (999)9999999 |
2 | 1 | (999)9999999 |
213710 | 707187 | (001)2548987 |
213710 | 759263 | (100)8348243 |
213705 | 2416730 | (156)6676200 |
213705 | 12116102 | (132)3453523 |
And it needs to look like this as the output:
SOURCE_ID | MATCH_ID | PHONE | GENERATED_ID |
---|---|---|---|
1 | 1 | (999)9999999 | 1 |
1 | 2 | (999)9999999 | 1 |
2 | 1 | (999)9999999 | 1 |
213710 | 707187 | (001)2548987 | 2 |
213710 | 759263 | (100)8348243 | 2 |
213705 | 2416730 | (156)6676200 | 3 |
213705 | 12116102 | (132)3453523 | 3 |
I've utilized the DENSE_RANK() function to create two separate IDs, one sorting on PHONE, the other SOURCEID column. The PHONE sort gives me the correct output for lines 1-3, but incorrect for 4-7, while the SOURCE_ID sort works on lines 4-6, but not 1-3.
How can I combine these in a way that gives the desired output above? I've tried combining the columns in every format possible, but no luck there either.
Output from testing, highlighted correct results. Each TEST## column is noted below
The SQL for reference:
SELECT SOURCE_ID,
MATCH_ID,
PHONE,
DENSE_RANK() OVER(ORDER BY PHONE) AS PHONE_SORT
DENSE_RANK() OVER(ORDER BY SOURCE_ID) AS SOURCE_ID_SORT
DENSE_RANK() OVER(ORDER BY MATCH_ID, INTERNAL_ROW_ID) AS TEST1,
DENSE_RANK() OVER(ORDER BY SOURCE_ID, MATCH_ID) AS TEST2,
DENSE_RANK() OVER(ORDER BY MATCH_ID) AS TEST3,
DENSE_RANK() OVER(ORDER BY MATCH_ID, SOURCE_ID, PHONE) AS TEST4,
DENSE_RANK() OVER(ORDER BY MATCH_ID, PHONE, SOURCE_ID) AS TEST5,
DENSE_RANK() OVER(ORDER BY SOURCE_ID, MATCH_ID, PHONE) AS TEST6,
DENSE_RANK() OVER(ORDER BY SOURCE_ID, PHONE, MATCH_ID) AS TEST7,
DENSE_RANK() OVER(ORDER BY PHONE, SOURCE_ID, MATCH_ID) AS TEST8,
DENSE_RANK() OVER(ORDER BY PHONE, MATCH_ID, SOURCE_ID) AS TEST9,
DENSE_RANK() OVER(ORDER BY PHONE, SOURCE_ID) AS TEST10,
DENSE_RANK() OVER(ORDER BY PHONE, MATCH_ID) AS TEST11,
DENSE_RANK() OVER(ORDER BY SOURCE_ID, PHONE) AS TEST12,
DENSE_RANK() OVER(ORDER BY MATCH_ID, PHONE) AS TEST13
FROM MY_TABLE;
TIA!
CodePudding user response:
that's some rather obtuse logic.
SELECT
column1
,column2
,column3
,dense_rank() over (order by rankable)
FROM (
SELECT *
,count(column1) over (partition by column1) c_c1
,count(column3) over (partition by column3) c_c3
,iff(c_c1> c_c3, column1::text, column3) as rankable
FROM VALUES
(1,1,'(999)9999999'),
(1, 2,'(999)9999999'),
(2, 1,'(999)9999999'),
(213710, 707187,'(001)2548987'),
(213710, 759263,'(100)8348243'),
(213705, 2416730,'(156)6676200'),
(213705, 12116102,'(132)3453523')
)
gives:
COLUMN1 | COLUMN2 | COLUMN3 | DENSE_RANK() OVER (ORDER BY RANKABLE) |
---|---|---|---|
1 | 1 | (999)9999999 | 1 |
1 | 2 | (999)9999999 | 1 |
2 | 1 | (999)9999999 | 1 |
213,705 | 2,416,730 | (156)6676200 | 2 |
213,705 | 12,116,102 | (132)3453523 | 2 |
213,710 | 707,187 | (001)2548987 | 3 |
213,710 | 759,263 | (100)8348243 | 3 |