I am trying to aggregate some data (delay_minutes) for a given employee_number over a month. My source of information is a view (some_view) from which data is aggregated by day. Then it is aggregated by month. If I select data from some_view and filter it directly, the query is extremely fast. But since I have to query my view my_view somehow I'm not able to pass the filter attributes to the inner query in order to optimize it. This query takes several hours, since some_view is very large and apparently the interim result is done on all data of some_view. Is there some way to optimize this query, so the filtering will be done in the inner query instead and I can still query my_view with the given parameters? It would be immensely helpful to me!
SELECT * FROM my_view where day_from >= DATE '2022-12-01' AND day_until <= DATE '2022-12-31' and employee_number = 'a123'
CREATE VIEW my_view AS (
SELECT
day_from,
day_until,
employee_number,
sum(s_delay_minutes),
coun(1) occurences
FROM (
SELECT
trunc(working_day, 'MM') as day_from
LAST_DAY(working_day) as day_until
employee_number
sum(delay_minutes) s_delay_minutes
FROM some_view
GROUP BY
working_day,
employee_number
)
GROUP BY
day_from,
day_until,
employee_number
)
CodePudding user response:
Try this
with test as (
SELECT
trunc(working_day, 'MM') as day_from
, LAST_DAY(working_day) as day_until
, employee_number
, sum(delay_minutes) s_delay_minutes
FROM some_view
GROUP BY
working_day
, employee_number
)
select
day_from
, day_until
, employee_number
, s_delay_minutes
from test
you don't need to group twice
CodePudding user response:
In order to get your predicates passed into the view, the predicates you are using must come from inside the some_view
view. But day_from
and day_until
are derived values you are computing outside the view. It has to execute the entire view without filters in order to compute those functions before it can apply the predicates to them.
You could apply TRUNC(working_day,'MM') working_month
inside of some_view
, as close to the table it comes from as possible, hopefully before any joins. Then group by that value in my_view
without modifying it in any way, and put your predicate on it in the outer query: and working_month = DATE '2022-12-01'
If you can't modify the view, then consider putting your inner view code directly in your outer view so you aren't nesting views. That gives you greater control.
CodePudding user response:
First of all, it makes no sense that you aggregate twice. Aggregate in one step instead.
CREATE VIEW my_view AS
(
SELECT
TRUNC(working_day, 'MM') AS day_from,
LAST_DAY(working_day) AS day_until,
employee_number,
SUM(delay_minutes) AS s_delay_minutes,
COUNT(DISTINCT working_day) AS occurences
FROM some_view
GROUP BY
TRUNC(working_day, 'MM'),
LAST_DAY(working_day),
employee_number
);
Now your query wants to select rows by day_from and day_until and employee_number. An index on employee_number alone would probably suffice, but for this you need to know the underlying table
CREATE INDEX idx1 ON some_table (employee_number);
If the working_day resides in the same table as the employee_number, you can include the terms for day_from and day_until in the index to have an even better chance that it will be used.
CREATE INDEX idx2 ON some_table (employee_number,
TRUNC(working_day, 'MM'),
LAST_DAY(working_day)
);
However, it is still for the DBMS to decide whether to use the index or not. And this depends on how selective the DBMS considers the conditions. Gathering optimizer statistics and histograms may be able to help a lot here: docs.oracle.com/.../gathering-optimizer-statistics
CodePudding user response:
- Get your filters performed as early as possible
- Do not create your own view - you just create more chances to confuse the optimizer. Perform queries in-line.
You want to include all the working days in December, 2022, so just Where working_days between date '2022-12-01' and date '2022-12-31'
. You want a count of the days the employee actually worked, so count(distinct working_day)
. If you wanted to get fancy you could even add a count of the days the employee was late (if delay_minutes represents late time) with count (distinct Case When nvl(s_delay_minutes,0)>0 Then working_day End)
. Put it all together,
Select employee_number, trunc(working_day,'MM') as month_begin
, LAST_DAY(working_day) as month_end, sum(s_delay_minutes) as late_time
, count(distinct working_day) as days_worked
, count(distinct Case when nvl(delay_minutes,0)>0 Then working_day End) as days_late
From some_view
Where working_day between date '2022-12-01' and date '2022-12-31'
And employee_number = 'a123'
Group By 1,2,3
Order by 1,2,3