I have an oracle db where there are multiple rows with matching names. Now I need to sample on this db but don't want to alter the distribution by including rows with the same names multiple times. What I tried is this.
SELECT *
FROM (
SELECT names, min(a), min(b)
FROM table GROUP BY names) SAMPLE(5);
That only gives
ORA-00933 SQL command not properly ended
The inner SELECT by itself works just fine, just the outer SAMPLE clauce seems to be a problem. Is there a different way to put it? Thank you
CodePudding user response:
I don't have your tables so I'm using Scott's emp
.
This is what you have now:
SQL> select *
2 from (select job, min(sal), max(sal)
3 from emp
4 group by job
5 )
6 sample (8);
sample (8)
*
ERROR at line 6:
ORA-00933: SQL command not properly ended
OK, so - let's use a CTE instead:
SQL> with temp as
2 (select job, min(sal) minsal, max(sal) maxsal
3 from emp
4 group by job
5 )
6 select *
7 from temp
8 sample (8);
with temp as
*
ERROR at line 1:
ORA-01446: cannot select ROWID from, or sample, a view with DISTINCT, GROUP BY,
etc.
Aha.
There's no easy way out (I mean, writing a standalone query). But, if you materialize the source (by creating a table based on a subquery), then it works:
SQL> create table test as
2 select job, min(sal) minsal, max(sal) maxsal
3 from emp
4 group by job;
Table created.
SQL> select * from test;
JOB MINSAL MAXSAL
--------- ---------- ----------
CLERK 800 1300
SALESMAN 1250 1600
PRESIDENT 5000 5000
MANAGER 2450 2975
ANALYST 3000 3000
SQL> select *
2 from test
3 sample (8);
JOB MINSAL MAXSAL
--------- ---------- ----------
PRESIDENT 5000 5000
MANAGER 2450 2975
SQL>
CodePudding user response:
If you check the documentation, SAMPLE
is only valid for branch with physical object underneath ([schema.]...
).
To achieve similar result with a subquery you may use:
- For all versions: calculate
row_number() over(order by null) as rn, count(*) over() as cnt
in the subquery and then filter byrn <= ceil(cnt*N/100)
. - For 12c and above:
fetch first N percent rows only
, which will do all the calculations under the hood. Note, that it does exactly the same as above for you without extra typing.
create table t (names, a, b) as select lpad(mod(level, 100), 3, '0') , level , level from dual connect by level < 300
select /* gather_plan_statistics*/ * from ( select names, min(a), min(b) from t group by names ) fetch first 8 percent rows only
NAMES | MIN(A) | MIN(B) :---- | -----: | -----: 003 | 3 | 3 020 | 20 | 20 026 | 26 | 26 029 | 29 | 29 035 | 35 | 35 049 | 49 | 49 061 | 61 | 61 063 | 63 | 63
db<>fiddle here
But keep in mind, that SAMPLE
is physical operation, that's why it throws
ORA-01446: cannot select ROWID from, or sample, a view with DISTINCT, GROUP BY, etc.
for view and it is the reason I wrote similar, not the same. In the plan:
| 1 | TABLE ACCESS SAMPLE| T |