Ordering within group and selecting non-null values-CodePudding

I've a table with data merged from several sources, and I would like to:

Group by a column x
Within each group, order by column z, preferring non-NULL values
For other columns within each group, take the first non-NULL value according to the order defined

Here's example input, call it table t:

X	Y	Z
a	a0	NULL
a	NULL	1
a	a2	2
b	b1	1
b	b2	2

And desired output:

X	Y	Z
a	a2	1
b	b1	1

`GROUP BY` and `ORDER BY`

My first thought was to use GROUP BY and ORDER BY:

select x, min(y), min(z)
from t
group by x
order by y

But I get

ORA-00979: not a GROUP BY expression

B/c, IIUC, the order of operations is GROUP BY, then SELECT the aggregates, then ORDER BY, so ORDER BY can only operate on what's in SELECT. That's not what I want.

FIRST_VALUE

I did manage to get the desired output with FIRST_VALUE:

SELECT DISTINCT
    x,
    FIRST_VALUE(y IGNORE NULLS)
    OVER(PARTITION BY x
        ORDER BY
            z
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) y,
    FIRST_VALUE(z IGNORE NULLS)
    OVER(PARTITION BY x
        ORDER BY
            z
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) z
FROM
    t;

but this just seems extremely verbose & repetitive. Is there a better solution where the window can be specified once rather than for each column?

CodePudding user response：

You may use match_recognize to capture rows with non-null values and reference them in a measures clause:

select x, y, z
from test_tab
match_recognize (
  partition by x
  order by z asc nulls last
  measures
    /*The first value of Z (nulls last)*/
    first(any_.z) as z,
    /*The first non-null value of Y:
        captured either by any_ pattern
        or by other that skips all nulls first
    */
    coalesce(first(any_.y), first(other.y)) as y
  /*Any row
    followed by zero or more Y with null
    followed by any number of any non-null Y rows for the same X
  */
  pattern (any_ y_null* other*)
  define
    y_null as y is null
)

Or from Oracle 21c you may reuse window specification by declaring it at the bottom of select statement and use your original approach:

select distinct
  x,
  first_value(y ignore nulls) over fv as y,
  first_value(z ignore nulls) over fv as z
from test_tab
window fv as (
  partition by x
  order by z asc nulls last
  rows between unbounded preceding and unbounded following
)

Given this sample data:

create table test_tab(X, Y, Z) as
  select 'a', 'a0', NULL from dual union all
  select 'a', NULL, 1 from dual union all
  select 'a', 'a2', 2 from dual union all
  select 'b', 'b1', 1 from dual union all
  select 'b', 'b2', 2 from dual

both the queries return this result:

X Y Z

a a2 1

b b1 1

db<>fiddle here

GROUP BY and ORDER BY

FIRST_VALUE

`GROUP BY` and `ORDER BY`