Home > Enterprise >  SAMPLE clause on GROUP BY
SAMPLE clause on GROUP BY

Time:11-22

I have an oracle db where there are multiple rows with matching names. Now I need to sample on this db but don't want to alter the distribution by including rows with the same names multiple times. What I tried is this.

SELECT *
FROM (
    SELECT names, min(a), min(b) 
    FROM table GROUP BY names) SAMPLE(5);

That only gives

ORA-00933 SQL command not properly ended

The inner SELECT by itself works just fine, just the outer SAMPLE clauce seems to be a problem. Is there a different way to put it? Thank you

CodePudding user response:

I don't have your tables so I'm using Scott's emp.

This is what you have now:

SQL> select *
  2  from (select job, min(sal), max(sal)
  3        from emp
  4        group by job
  5       )
  6  sample (8);
sample (8)
       *
ERROR at line 6:
ORA-00933: SQL command not properly ended

OK, so - let's use a CTE instead:

SQL> with temp as
  2    (select job, min(sal) minsal, max(sal) maxsal
  3     from emp
  4     group by job
  5    )
  6  select *
  7  from temp
  8  sample (8);
with temp as
*
ERROR at line 1:
ORA-01446: cannot select ROWID from, or sample, a view with DISTINCT, GROUP BY,
etc.

Aha.

There's no easy way out (I mean, writing a standalone query). But, if you materialize the source (by creating a table based on a subquery), then it works:

SQL> create table test as
  2    select job, min(sal) minsal, max(sal) maxsal
  3    from emp
  4    group by job;

Table created.

SQL> select * from test;

JOB           MINSAL     MAXSAL
--------- ---------- ----------
CLERK            800       1300
SALESMAN        1250       1600
PRESIDENT       5000       5000
MANAGER         2450       2975
ANALYST         3000       3000

SQL> select *
  2  from test
  3  sample (8);

JOB           MINSAL     MAXSAL
--------- ---------- ----------
PRESIDENT       5000       5000
MANAGER         2450       2975

SQL>

CodePudding user response:

If you check the documentation, SAMPLE is only valid for branch with physical object underneath ([schema.]...).

To achieve similar result with a subquery you may use:

  • For all versions: calculate row_number() over(order by null) as rn, count(*) over() as cnt in the subquery and then filter by rn <= ceil(cnt*N/100).
  • For 12c and above: fetch first N percent rows only, which will do all the calculations under the hood. Note, that it does exactly the same as above for you without extra typing.
create table t (names, a, b) as
select
  lpad(mod(level, 100), 3, '0')
  , level
  , level
from dual
connect by level < 300
select /* gather_plan_statistics*/ *
from (
  select names, min(a), min(b) 
  from t
  group by names
)
fetch first 8 percent rows only
NAMES | MIN(A) | MIN(B)
:---- | -----: | -----:
003   |      3 |      3
020   |     20 |     20
026   |     26 |     26
029   |     29 |     29
035   |     35 |     35
049   |     49 |     49
061   |     61 |     61
063   |     63 |     63

db<>fiddle here

But keep in mind, that SAMPLE is physical operation, that's why it throws

ORA-01446: cannot select ROWID from, or sample, a view with DISTINCT, GROUP BY, etc.

for view and it is the reason I wrote similar, not the same. In the plan:

|   1 |  TABLE ACCESS SAMPLE| T    |
  • Related