SELECT REGEXP_REPLACE(LISTAGG(A.ID, ',') WITHIN GROUP (ORDER BY A.ID), '([^,] )(,\1) ', '\1')
FROM TABLE A
I don't know what "\1" means in the above SQL. After creating a list by separating "A.ID" with commas through "LISTAGG", the purpose seems to be to remove the duplicate "A.ID", but I want to know the exact meaning.
For reference, "A.ID" is a NUMBER(4) column type. (e.g. 1111, 2222...)
CodePudding user response:
It means that you are attempting to replace duplicates.
REGEXP_REPLACE(
comma_separated_list,
'([^,] )(,\1) ',
'\1'
)
Then:
([^,] )
will match one-or-more non-comma characters and store the value in a capturing group.,\1
will match a comma and then the value from that first capturing group.(,\1)
matches a comma and then the value from that first capturing group and matches it all one-or-more times.
However, it does not work reliably. If you have the list 12,123,123,123,123,1234,4444
Then:
the first match will be:
12,123,123,123,123,1234,4444 ^^^^^
at the start and replace it with just
12
giving:123,123,123,123,1234,4444 ^^
It has already gone wrong as the match did not match a complete element and you have lost the first
id
value.the second match will start after the first match will skip the
3,
characters and match:123,123,123,123,1234,4444 ^^^^^^^^^^^^^^^
and replace it with
123
giving:123,1234,4444 ^^^
Again, the match was wrong as it did not match a complete element and it is only coincidence that the value output is correct.
and the final match will be:
123,1234,4444 ^^^
replacing it with just
4
and giving the output:123,1234444
Which is very wrong as you are now missing the
id
12
and have an incorrectid
of1234444
.
What you should probably be doing is filtering the duplicates before aggregating.
In newer Oracle versions it is simply:
SELECT LISTAGG(DISTINCT ID, ',') WITHIN GROUP (ORDER BY ID)
FROM TABLE
or in older versions:
SELECT LISTAGG(ID, ',') WITHIN GROUP (ORDER BY ID)
FROM (
SELECT DISTINCT id FROM TABLE
)
If you did want to use regular expressions (which will be a more inefficient than using DISTINCT
) then you can double up the delimiters and ensure you always match complete elements using (,[^,] ,)\1
but then you also need to remove the repeated delimiters after de-duplication (which makes an inefficient solution even more inefficient):
SELECT TRIM(
BOTH ',' FROM
REPLACE(
REGEXP_REPLACE(
LISTAGG(',' || ID || ',') WITHIN GROUP (ORDER BY ID),
'(,[^,] ,)\1 ',
'\1'
),
',,',
','
)
) AS no_duplicates
FROM TABLE_NAME
db<>fiddle here