Basically the current conditions of the query are
WHERE data_payload_uri BETWEEN
'/organization/team/folder/2021'
AND
'/organization/team/folder/2022'
And this gets all data for the year of 2021.
A sample of the data_payload_uri data looks like this:
/organization/team/folder/20210101/orig
/organization/team/folder/20210102/orig
/organization/team/folder/20210102/orig_v1
/organization/team/folder/20210103/orig
/organization/team/folder/20210104/orig
/organization/team/folder/20210105/orig
/organization/team/folder/20210105/orig_v1
/organization/team/folder/20210105/orig_v2
What I would like to do is only query the rows where up until the last forward-slash, the row is NOT unique.
What this means, is I want to NOT query the rows which ONLY have one orig
/organization/team/folder/20210101/orig
/organization/team/folder/20210103/orig
/organization/team/folder/20210104/orig
but I DO want to query all the other rows
/organization/team/folder/20210105/orig
/organization/team/folder/20210105/orig_v1
/organization/team/folder/20210105/orig_v2
/organization/team/folder/20210102/orig
/organization/team/folder/20210102/orig_v1
What is the best way to do this? Pls let me know if anything is unclear and thank you for any help
CodePudding user response:
You can use the analytic COUNT
function:
SELECT *
FROM (
SELECT t.*,
COUNT(DISTINCT data_payload_uri) OVER (
PARTITION BY SUBSTR(data_payload_uri, 1, INSTR(data_payload_uri, '/', -1))
) AS cnt
FROM table_name t
WHERE data_payload_uri >= '/organization/team/folder/2021'
AND data_payload_uri < '/organization/team/folder/2022'
)
WHERE cnt > 1
Which, for the sample data:
CREATE TABLE table_name (id, data_payload_uri) AS
SELECT 1, '/organization/team/folder/20210101/orig' FROM DUAL UNION ALL
SELECT 2, '/organization/team/folder/20210102/orig' FROM DUAL UNION ALL
SELECT 3, '/organization/team/folder/20210102/orig_v1' FROM DUAL UNION ALL
SELECT 4, '/organization/team/folder/20210103/orig' FROM DUAL UNION ALL
SELECT 5, '/organization/team/folder/20210104/orig' FROM DUAL UNION ALL
SELECT 6, '/organization/team/folder/20210105/orig' FROM DUAL UNION ALL
SELECT 7, '/organization/team/folder/20210105/orig_v1' FROM DUAL UNION ALL
SELECT 8, '/organization/team/folder/20210105/orig_v2' FROM DUAL;
Outputs:
ID DATA_PAYLOAD_URI CNT 2 /organization/team/folder/20210102/orig 2 3 /organization/team/folder/20210102/orig_v1 2 6 /organization/team/folder/20210105/orig 3 7 /organization/team/folder/20210105/orig_v1 3 8 /organization/team/folder/20210105/orig_v2 3
db<>fiddle here