Oracle SQL: LISTAGG vs. XMLAGG - different results when using CASE WHEN-CodePudding

I try to use listagg over xmlagg to bypass the length limitation of 4000 bytes of listagg.

My statement however includes a case whenstatement, causing different results:

Select date_column as "Some Date",
       sum(case when status = 'active' then 1 else 0 end) as "active id numbers",
       listagg(case when status = 'active' then id_number END, ',' on overflow truncate) within group (order by id_number) as "Listagg List of active id numbers"
       rtrim(xmlagg(xmlelement(e,case when status = 'active' then id_number END,',').extract('//text()') order by id_number).GetClobVal(),',') AS "Xmlagg List of active id numbers"
from   mytable
group by date_column;

Result:

Date	Active id numbers	Listagg List of active id numbers	Xmlagg List of active id numbers
01/01/2021	4	2334,12333,512333,611651	2334,,,,,12333,,,,512333,,611651,,
01/02/2021	6	4213,6481,7900,16515,22060	,,,4213,,,6481,,,,,,7900,,16515,22060,,,,,,
01/03/2021	2	416,754	,,,416,,,,,,,,754,,,

I face two issues with xmlagg over listagg:

apparently, the function of xmlagg would always trigger even if case when is not true, causing empty , eliminators
performance: it is much slower (~3 times the execution time)

Did I do some mistake with using case when within xmlagg?

Thanks in advance!

CodePudding user response：

As I said in my previous answer, you could alternatively filter in the WHERE clause rather than using a CASE expression:

Select date_column as "Some Date",
       COUNT(*) as "active id numbers",
       RTRIM(
         XMLAGG(
           XMLELEMENT(
             E,
             id_number,
             ','
           ).EXTRACT('//text()')
           ORDER BY id_number
         ).GetClobVal(),
         ','
       ) AS LIST
from   mytable
where  status = 'active'
group by date_column;

That way you will not include the non-active values that will be converted to null by the CASE expression.

db<>fiddle here

Or, you can use a custom aggregation function.

As for the difference, LISTAGG is documented to ignore NULL values:

The arguments to the function are subject to the following rules:

The measure_expr can be any expression. Null values in the measure column are ignored.

XMLAGG does the same:

XMLAgg is an aggregate function. It takes a collection of XML fragments and returns an aggregated XML document. Any arguments that return null are dropped from the result.

However, if you do:

SELECT DUMP(XMLELEMENT(E, NULL, ',').EXTRACT('//text()')) FROM DUAL;

You will find that the output has Typ=58 which is an XMLTYPE data type (and not a string) so the values being aggregated are not NULL and are not going to be ignored.

For example:

CREATE TABLE table_name (value) AS
SELECT 1 FROM DUAL UNION ALL
SELECT 2 FROM DUAL UNION ALL
SELECT NULL FROM DUAL UNION ALL
SELECT 4 FROM DUAL;

Select LISTAGG(value, ',') WITHIN GROUP (ORDER BY ROWNUM) AS listagg_list,
       RTRIM(
         XMLAGG(
           XMLELEMENT(
             E,
             value,
             ','
           ).EXTRACT('//text()')
           ORDER BY ROWNUM
         ).GetClobVal(),
         ','
       ) AS xmlagg_list
from   table_name;

Outputs:

LISTAGG_LIST XMLAGG_LIST

1,2,4 1,2,,4

LISTAGG_LIST	XMLAGG_LIST
1,2,4	1,2,,4

db<>fiddle here

And regarding performance, you are asking the XMLAGG solution to create XML elements and then extract the text from the XML element and then aggregate and convert the XMLType to a CLOB (rather than a smaller VARCHAR2) and then to strip trailing commas; it is unsurprising that, when it doing so much more work, that it takes longer.