Home > database >  Count based on the max status on a given date, with grouped data
Count based on the max status on a given date, with grouped data

Time:11-24

My sample is a ticketing system, holding entries for status updates and the creation of a ticket.

Fiddle: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=a5ff4600adbab185eb14b08586f1bd29

ID TICKETID STATUS TICKET_CREATED STATUS_CHANGED
1 1 other_error 01-JAN-20 01-JAN-20 08.00.00
2 2 tech_error 01-JAN-20 01-JAN-20 09.00.00
3 3 unknown 01-JAN-20 01-JAN-20 09.10.00
4 4 unknown 01-JAN-20 01-JAN-20 09.20.00
5 4 tech_error 01-JAN-20 02-JAN-20 09.30.00
6 1 solved 01-JAN-20 02-JAN-20 10.00.00
7 2 solved 01-JAN-20 02-JAN-20 07.00.00
8 5 tech_error 02-JAN-20 02-JAN-20 08.00.00
9 6 unknown 02-JAN-20 02-JAN-20 08.30.00
10 6 solved 02-JAN-20 02-JAN-20 09.30.00
11 5 solved 02-JAN-20 03-JAN-20 08.00.00
12 4 unknown 01-JAN-20 03-JAN-20 09.00.00

I want to evaluate the data based on the ticket creation date, and obtain three things for a specific date:

  1. (done) how many tickets where totally created on the given date
  2. (done) how many tickets were created in status 'unknown' on the given date
  3. (not done) how many tickets were totally in status 'unknown' on the given date? Tricky! Because what matters is the status at the max STATUS_CHANGED below midnight of the given date.

Desired result for 01.01.2021:

TICKET_CREATED Total Created Tickets created in Unknown status Total tickets in Unknown status
01-JAN-20 4 2 2

Explanation: on 01-JAN-20, ticket 3 and 4 were in status 'unknown' at end of the day

Desired result for 02.01.2021:

TICKET_CREATED Total Created Tickets created in Unknown status Total tickets in Unknown status
02-JAN-20 2 1 1

Explanation: on 02-JAN-20, only ticket 3 was in status 'unknown' at end of the day

Current solution for part 1 2:

select ticket_created, 
count(*) as "Total Created",
sum(case when status = 'unknown' then 1 else 0 end) as "Unknown tickets created",
'?' as "Total tickets in Unknown status"
from myTable
where id in
    (select min(id) as id
    from myTable
    where ticket_created = to_date('01.01.2020', 'DD.MM.YYYY')
    group by ticketid)
group by ticket_created

Could you give me some hint on how to approach point 3?

CodePudding user response:

This is how I would achieve your aims, assuming I've understood your logic correctly:

with ticket_info as (select id,
                            ticketid,
                            status,
                            ticket_created,
                            status_changed,
                            row_number() over (partition by ticketid, trunc(status_changed) order by status_changed desc) rn_per_id_day_desc,
                            row_number() over (partition by ticketid order by status_changed) rn_per_id_asc
                     from   mytable)
select ticket_created,
       count(distinct case when trunc(ticket_created) = to_date('01/01/2020', 'dd/mm/yyyy') then ticketid end) as "Total Created",
       count(case when rn_per_id_asc = 1 and status = 'unknown' then 1 end) as "Unknown tickets created",
       count(case when rn_per_id_day_desc = 1 and status = 'unknown' then 1 end) as "Total tickets in Unknown status"
from   ticket_info
where  status_changed >= to_timestamp('01/01/2020', 'dd/mm/yyyy')
and    status_changed < to_timestamp('01/01/2020', 'dd/mm/yyyy')   interval '1' day
group by ticket_created;

db<>fiddle

You can see that first of all, I use a couple of row_number() analytic functions to provide labels for the rows - one labels the rows per id in the order they were changed (which allows us to identify the first row per id, i.e. the ticket created row), the other labels the rows per id and day in descending order (which allows us to identify the last row of the day per id).

Using that information, we can calculate all three of your cases:

  1. tickets created on a day - here I've used a count distinct, but you could change it to count(case when rn_per_id_asc = 1 then 1 end), which may well be more efficient and easier to understand.
  2. tickets created on a day as "unknown" - here I've used a conditional count: if it's the first row and it has a status unknown, then count it
  3. tickets in "unknown" status at the end of the day - here I've used another conditional count: if it's the last row of the day and the status is unknown, count it.

ETA: with the logic for the third part being amended to count active tickets with a status of unknown at the end of the day, I think this should do the trick:

with date_of_interest as (select start_date   level -1 dt,
                                 start_date   level next_dt
                          from   (select to_date('01/01/2020', 'dd/mm/yyyy') start_date,
                                         to_date('03/01/2020', 'dd/mm/yyyy') end_date
                                  from   dual)
                          connect by level <= (end_date - start_date)   1),
          ticket_info as (select mt.id,
                                 mt.ticketid,
                                 mt.status,
                                 mt.ticket_created,
                                 mt.status_changed,
                                 row_number() over (partition by mt.ticketid, doi.dt order by mt.status_changed) rn_per_id_asc,
                                 row_number() over (partition by mt.ticketid, doi.dt order by mt.status_changed desc) rn_per_id_desc,
                                 doi.dt,
                                 doi.next_dt
                          from   mytable mt
                                 inner join date_of_interest doi on mt.status_changed < doi.next_dt
                          )
select dt,
       count(case when ticket_created = dt and rn_per_id_asc = 1 then 1 end) as "Total Created",
       count(case when ticket_created = dt and rn_per_id_asc = 1 and status = 'unknown' then 1 end) as "Unknown tickets created",
       count(case when rn_per_id_desc = 1 and status = 'unknown' then 1 end) as "Total tickets in Unknown status"
from   ticket_info
group by dt
order by dt;

You'll note that I've updated the query to run across multiple days - if the query is only ever going to run for one date at a time, you can replace the date_of_interest subquery like so:

with date_of_interest as (select dt,
                                 dt   1 next_dt
                          from   (select to_date('03/01/2020', 'dd/mm/yyyy') dt
                                  from   dual)),

Updated db<>fiddle

N.B. This isn't going to be the most efficient way of doing things; over time, as more and more records are present, the query will slow down. If you can work out a way to easily identify active tickets, especially if you can get that info in an index, then that would be better.

CodePudding user response:

Here's a solution that calculates the 3th metric separately.
Then joins them to the metrics you already knew.

with cte_ranges as (
  select id, status, ticketid, ticket_created
  , status_changed as started
  , coalesce(
     lead(status_changed) over (partition by ticketid order by status_changed)
    , current_timestamp) as ended
  from myTable
  where trunc(ticket_created) between DATE'2020-01-01' and DATE'2020-01-02'
)
select q.ticket_date   as "Ticket Created"
     , q.total_tickets as "Total Created"
     , q.total_unknown as "Unknown tickets created"
     , endofday.total_unknown "Total tickets in Unknown status"
from
(
  select trunc(t.ticket_created) as ticket_date
  , count(distinct t.ticketid) as total_tickets
  , count(distinct case when t.status = 'unknown' then t.ticketid end) as total_unknown
  from cte_ranges t
  group by trunc(t.ticket_created) 
) q
left join (
  select trunc(cast(dt as date)) as ticket_date
  , count(distinct case when status = 'unknown' then ticketid end) as total_unknown
  from cte_ranges
  join (
    select distinct 
     cast(trunc(ticket_created) 1 as timestamp) - interval '1' second as dt 
    from cte_ranges
  ) cutoff on dt between started and ended
  group by cast(dt as date)
) endofday 
on endofday.ticket_date = q.ticket_date;
Ticket Created Total Created Unknown tickets created Total tickets in Unknown status
01-JAN-20 4 2 2
02-JAN-20 2 1 1

db<>fiddle here

The trick was to first use LEAD to calculate the ranges that a status was active.

Then the cutoff times (last minute of day) were joined to those ranges.
This way you get the days that a status was still active.

Both sub-queries use the CTE. That way you only need to change the date criteria in the CTE.

CodePudding user response:

The solution is simple:

  • Use LEAD function to find the end date for each status change
    • For example the end date for 4, unknown, 01-JAN-20 09.20.00 would be 02-JAN-20 09.30.00
  • Test if this date range intersects the specified date range
    • To check 02-JAN-20 you would actually use the range [02-JAN-20, 03-JAN-20)
    • Notice it intersects some portion of [01-JAN-20 09.20.00, 02-JAN-20 09.30.00)
    • Checking if date ranges intersect is trivial

Here is the query and demo:

WITH cte AS (
    SELECT mytable.*
         , LEAD(Status_Changed) OVER (PARTITION BY TicketId ORDER BY Status_Changed) AS Next_Changed
    FROM mytable
)
SELECT COUNT(DISTINCT CASE WHEN TICKET_CREATED = DATE'2020-01-02'                        THEN TicketId END) AS "Created on this date"
     , COUNT(DISTINCT CASE WHEN TICKET_CREATED = DATE'2020-01-02' AND STATUS = 'unknown' THEN TicketId END) AS "Created unknown on this date"
     , COUNT(DISTINCT CASE WHEN STATUS = 'unknown'                                       THEN TicketId END) AS "Total unknown through this date"
FROM cte
WHERE (
    -- tickets that were created on the date
    TICKET_CREATED = DATE'2020-01-02'
) OR (
    -- status is unknown and some portion of the time period overlaps the date
    STATUS = 'unknown' AND
    STATUS_CHANGED < DATE'2020-01-02'   INTERVAL '1' DAY AND (
        Next_Changed > DATE'2020-01-02' OR
        Next_Changed IS NULL
    )
)

DB<>Fiddle

Notice that Total unknown through this date for 2020-01-02 is 3:

  • Ticket #3 has no change so it remains in unknown state indefinitely
  • Ticket #4 remains in unknown partially through this date
  • Ticket #6 was created on this date
  • Related