Optimizing monthly aggregated nested view in Oracle SQL-CodePudding

I am trying to aggregate some data (delay_minutes) for a given employee_number over a month. My source of information is a view (some_view) from which data is aggregated by day. Then it is aggregated by month. If I select data from some_view and filter it directly, the query is extremely fast. But since I have to query my view my_view somehow I'm not able to pass the filter attributes to the inner query in order to optimize it. This query takes several hours, since some_view is very large and apparently the interim result is done on all data of some_view. Is there some way to optimize this query, so the filtering will be done in the inner query instead and I can still query my_view with the given parameters? It would be immensely helpful to me!

SELECT * FROM my_view where day_from >= DATE '2022-12-01' AND day_until <= DATE '2022-12-31' and employee_number = 'a123'

CREATE VIEW my_view AS (
    SELECT
            day_from,
            day_until,
            employee_number,
            sum(s_delay_minutes),
            coun(1) occurences
        FROM (
            SELECT 
                trunc(working_day, 'MM') as day_from
                LAST_DAY(working_day) as day_until
                employee_number
                sum(delay_minutes) s_delay_minutes
            FROM some_view
            GROUP BY
                working_day,
                employee_number
        )
        GROUP BY
            day_from,
            day_until,
            employee_number
)

CodePudding user response：

Try this

with test as (

SELECT 
     trunc(working_day, 'MM') as day_from
   , LAST_DAY(working_day) as day_until
   , employee_number
   , sum(delay_minutes) s_delay_minutes 
FROM some_view
GROUP BY
     working_day
   , employee_number
)

select 
     day_from
   , day_until
   , employee_number
   , s_delay_minutes
from test

you don't need to group twice

CodePudding user response：

In order to get your predicates passed into the view, the predicates you are using must come from inside the some_view view. But day_from and day_until are derived values you are computing outside the view. It has to execute the entire view without filters in order to compute those functions before it can apply the predicates to them.

You could apply TRUNC(working_day,'MM') working_month inside of some_view, as close to the table it comes from as possible, hopefully before any joins. Then group by that value in my_view without modifying it in any way, and put your predicate on it in the outer query: and working_month = DATE '2022-12-01'

If you can't modify the view, then consider putting your inner view code directly in your outer view so you aren't nesting views. That gives you greater control.

CodePudding user response：

First of all, it makes no sense that you aggregate twice. Aggregate in one step instead.

CREATE VIEW my_view AS
(
  SELECT 
    TRUNC(working_day, 'MM') AS day_from,
    LAST_DAY(working_day) AS day_until,
    employee_number,
    SUM(delay_minutes) AS s_delay_minutes,
    COUNT(DISTINCT working_day) AS occurences
  FROM some_view
  GROUP BY
    TRUNC(working_day, 'MM'),
    LAST_DAY(working_day),
    employee_number
);

Now your query wants to select rows by day_from and day_until and employee_number. An index on employee_number alone would probably suffice, but for this you need to know the underlying table

CREATE INDEX idx1 ON some_table (employee_number);

If the working_day resides in the same table as the employee_number, you can include the terms for day_from and day_until in the index to have an even better chance that it will be used.

CREATE INDEX idx2 ON some_table (employee_number,
                                 TRUNC(working_day, 'MM'),
                                 LAST_DAY(working_day)
                                );

However, it is still for the DBMS to decide whether to use the index or not. And this depends on how selective the DBMS considers the conditions. Gathering optimizer statistics and histograms may be able to help a lot here: docs.oracle.com/.../gathering-optimizer-statistics

CodePudding user response：

Get your filters performed as early as possible
Do not create your own view - you just create more chances to confuse the optimizer. Perform queries in-line.

You want to include all the working days in December, 2022, so just Where working_days between date '2022-12-01' and date '2022-12-31'. You want a count of the days the employee actually worked, so count(distinct working_day). If you wanted to get fancy you could even add a count of the days the employee was late (if delay_minutes represents late time) with count (distinct Case When nvl(s_delay_minutes,0)>0 Then working_day End). Put it all together,

Select employee_number, trunc(working_day,'MM') as month_begin
    , LAST_DAY(working_day) as month_end, sum(s_delay_minutes) as late_time
    , count(distinct working_day) as days_worked
    , count(distinct Case when nvl(delay_minutes,0)>0 Then working_day End) as days_late
From some_view
Where working_day between date '2022-12-01' and date '2022-12-31'
    And employee_number = 'a123'
Group By 1,2,3
Order by 1,2,3