Home > other >  How to query data which is not unique up to a certain point?
How to query data which is not unique up to a certain point?

Time:02-17

Basically the current conditions of the query are

WHERE data_payload_uri BETWEEN
'/organization/team/folder/2021'
AND
'/organization/team/folder/2022'

And this gets all data for the year of 2021.

A sample of the data_payload_uri data looks like this:

/organization/team/folder/20210101/orig
/organization/team/folder/20210102/orig
/organization/team/folder/20210102/orig_v1
/organization/team/folder/20210103/orig
/organization/team/folder/20210104/orig
/organization/team/folder/20210105/orig
/organization/team/folder/20210105/orig_v1
/organization/team/folder/20210105/orig_v2

What I would like to do is only query the rows where up until the last forward-slash, the row is NOT unique.

What this means, is I want to NOT query the rows which ONLY have one orig

/organization/team/folder/20210101/orig
/organization/team/folder/20210103/orig
/organization/team/folder/20210104/orig

but I DO want to query all the other rows

/organization/team/folder/20210105/orig
/organization/team/folder/20210105/orig_v1
/organization/team/folder/20210105/orig_v2
/organization/team/folder/20210102/orig
/organization/team/folder/20210102/orig_v1

What is the best way to do this? Pls let me know if anything is unclear and thank you for any help

CodePudding user response:

You can use the analytic COUNT function:

SELECT *
FROM   (
  SELECT t.*,
         COUNT(DISTINCT data_payload_uri) OVER (
           PARTITION BY SUBSTR(data_payload_uri, 1, INSTR(data_payload_uri, '/', -1))
         ) AS cnt
  FROM   table_name t
  WHERE  data_payload_uri >= '/organization/team/folder/2021'
  AND    data_payload_uri <  '/organization/team/folder/2022'
)
WHERE  cnt > 1

Which, for the sample data:

CREATE TABLE table_name (id, data_payload_uri) AS
SELECT 1, '/organization/team/folder/20210101/orig' FROM DUAL UNION ALL
SELECT 2, '/organization/team/folder/20210102/orig' FROM DUAL UNION ALL
SELECT 3, '/organization/team/folder/20210102/orig_v1' FROM DUAL UNION ALL
SELECT 4, '/organization/team/folder/20210103/orig' FROM DUAL UNION ALL
SELECT 5, '/organization/team/folder/20210104/orig' FROM DUAL UNION ALL
SELECT 6, '/organization/team/folder/20210105/orig' FROM DUAL UNION ALL
SELECT 7, '/organization/team/folder/20210105/orig_v1' FROM DUAL UNION ALL
SELECT 8, '/organization/team/folder/20210105/orig_v2' FROM DUAL;

Outputs:

ID DATA_PAYLOAD_URI CNT
2 /organization/team/folder/20210102/orig 2
3 /organization/team/folder/20210102/orig_v1 2
6 /organization/team/folder/20210105/orig 3
7 /organization/team/folder/20210105/orig_v1 3
8 /organization/team/folder/20210105/orig_v2 3

db<>fiddle here

  • Related