I've a table with data merged from several sources, and I would like to:
- Group by a column
x
- Within each group, order by column
z
, preferring non-NULL
values - For other columns within each group, take the first non-
NULL
value according to the order defined
Here's example input, call it table t
:
X | Y | Z |
---|---|---|
a | a0 | NULL |
a | NULL | 1 |
a | a2 | 2 |
b | b1 | 1 |
b | b2 | 2 |
And desired output:
X | Y | Z |
---|---|---|
a | a2 | 1 |
b | b1 | 1 |
GROUP BY
and ORDER BY
My first thought was to use GROUP BY
and ORDER BY
:
select x, min(y), min(z)
from t
group by x
order by y
But I get
ORA-00979: not a GROUP BY expression
B/c, IIUC, the order of operations is GROUP BY
, then SELECT
the aggregates, then ORDER BY
, so ORDER BY
can only operate on what's in SELECT
. That's not what I want.
FIRST_VALUE
I did manage to get the desired output with FIRST_VALUE
:
SELECT DISTINCT
x,
FIRST_VALUE(y IGNORE NULLS)
OVER(PARTITION BY x
ORDER BY
z
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) y,
FIRST_VALUE(z IGNORE NULLS)
OVER(PARTITION BY x
ORDER BY
z
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) z
FROM
t;
but this just seems extremely verbose & repetitive. Is there a better solution where the window can be specified once rather than for each column?
CodePudding user response:
You may use match_recognize
to capture rows with non-null values and reference them in a measures
clause:
select x, y, z from test_tab match_recognize ( partition by x order by z asc nulls last measures /*The first value of Z (nulls last)*/ first(any_.z) as z, /*The first non-null value of Y: captured either by any_ pattern or by other that skips all nulls first */ coalesce(first(any_.y), first(other.y)) as y /*Any row followed by zero or more Y with null followed by any number of any non-null Y rows for the same X */ pattern (any_ y_null* other*) define y_null as y is null )
Or from Oracle 21c you may reuse window specification by declaring it at the bottom of select
statement and use your original approach:
select distinct x, first_value(y ignore nulls) over fv as y, first_value(z ignore nulls) over fv as z from test_tab window fv as ( partition by x order by z asc nulls last rows between unbounded preceding and unbounded following )
Given this sample data:
create table test_tab(X, Y, Z) as select 'a', 'a0', NULL from dual union all select 'a', NULL, 1 from dual union all select 'a', 'a2', 2 from dual union all select 'b', 'b1', 1 from dual union all select 'b', 'b2', 2 from dual
both the queries return this result:
X Y Z a a2 1 b b1 1
db<>fiddle here