I am using Oracle SQL Developer and I need to pull counts from a very large database. I need to count both the number of visits and the number of unique visitors at each of 5 sites (and in total) for each of 10 quarters (and in total) - resulting in 120 counts. Each row in this database represents one visit, and the visitors each have a unique visitor_ID.
Currently, I have one line for each count, but I need to make modifications and I don't want to do it in such an inefficient way this time.
select
sum(case when visit_date between '01-JAN-2019 00:00:00' and '31-MAR-2019 00:00:00' and site=site1 then 1 else 0 end) as 19Q1_visits_site1,
count(distinct case when visit_date between '01-JAN-2019 00:00:00' and '31-MAR-2019 00:00:00' and site=site1 then visitor_id) as 19Q1_unique_site1,
[...]
from visitdata
where [additional qualifiers];
If possible, I would like to create something along the lines of:
19Q1 = visit_date between '01-JAN-2019 00:00:00' and '31-MAR-2019 00:00:00'
19Q2 = visit_date between '01-APR-2019 00:00:00' and '30-JUN-2019 00:00:00'
[...]
allQ = visit_date between '01-JAN-2019 00:00:00' and '30-SEP-2021 00:00:00'
S1 = site in (site1)
[...]
allS = site in (site1, site2, site3, site4, site5)
sites = [S1, S2, S3, S4, S5, allS]
quarters = [19Q1, 19Q2, ..., allQ]
for s in sites:
for q in quarters:
select
sum(case when q and s then 1 else 0 end) as (str(q) 'visits' str(s)),
count(distinct case when q and s then visitor_id) as (str(q) 'unique' str(s))
from visitdata
where [additional qualifiers];
I know SQL doesn't do for loops. Any advice would be fantastic so I don't have to create another embarrassing script with almost 200 lines. Thanks!
CodePudding user response:
Very briefly, use a built-in or user-defined function that takes a date and returns the quarter to which that date belongs. (SQL Server supports "quarter" as a datepart, but you can probably write this yourself if Oracle doesn't support it. You could also add visit_quarter to your table as a materialized computed column and even index it if you use this a lot.) Then you can write a single grouping query along the lines of
SELECT
site,
quarter(visit_date) as Q,
COUNT(visitor_id) as numvisits,
COUNT(DISTINCT visitor_id) AS numDistinctVisitors
FROM T
WHERE <additional conditions>
GROUP BY site, quarter(visit_date)
ORDER BY site, Q
CodePudding user response:
For finding quarters you can use TO_CHAR() on dates, and for changing the start date and end date of a quarter, you can use ADD_MONTHS() eg
Table
create table randomdates( date_ )
as
select sysdate - dbms_random.value( 1, 600 )
from dual
connect by level <= 50 ;
Query
select date_
, to_char( date_, 'YYYY-Q' ) quarter
from randomdates
;
-- result
DATE_ QUARTER
09/23/2021 2021-3
09/24/2020 2020-3
03/23/2020 2020-1
03/29/2021 2021-1
11/29/2020 2020-4
03/05/2021 2021-1
04/08/2021 2021-2
...
GROUP BY should also be possible - as @Steve Kass suggested.
select to_char( date_, 'YYYY-Q' ), count(*)
from randomdates
group by to_char( date_, 'YYYY-Q' )
order by 1 desc
;
TO_CHAR(DATE_,'YYYY-Q') COUNT(*)
2021-4 6
2021-3 13
2021-2 22
2021-1 13
2020-4 20
2020-3 12
2020-2 10
2020-1 4
...
From your comment:
I'm using a modified fiscal year anyway. Do you know how I can define quarters when the visit_date is stored as yyyymmdd?
If you need different start/end days for the, ADD_MONTHS() could help eg the modified_quarters in the following query start a month later than the "standard" quarters. Regarding the dates: in Oracle, they are always stored as YYYYMMDDHHMISS, you can just use TO_CHAR() and pick up whichever component (of the date) you need via the "Format Model" eg 'Q' in the example below.
-- Query executed in APEX.
-- Column date_ : no formatting (compare the output to the same query in the dbfiddle).
select date_
, to_char( date_, 'YYYY-Q' ) quarter
, to_char( date_, 'YY' ) || 'Q' || to_char( date_, 'Q' ) quarter_
, to_char( add_months( date_, 1 ), 'YYYY-Q' ) modified_quarter
from randomdates
;
DATE_ QUARTER QUARTER_ MODIFIED_QUARTER
09/23/2021 2021-3 21Q3 2021-4
09/24/2020 2020-3 20Q3 2020-4
03/23/2020 2020-1 20Q1 2020-2
03/29/2021 2021-1 21Q1 2021-2
11/29/2020 2020-4 20Q4 2020-4
03/05/2021 2021-1 21Q1 2021-2
For calculating subtotals and total (counts) per site, you could use GROUP BY ROLLUP() eg
Table & data
-- Caution: dates in this table are not the same as in the randomdates table.
create table sitesanddates( site_, date_ )
as
select
trunc( dbms_random.value( 1, 6 ) )
, sysdate - dbms_random.value( 1, 600 )
from dual
connect by level <= 50 ;
-- group by site and quarter
select site_, to_char( date_, 'YYYY-Q' ), count(*)
from sitesanddates
group by site_, to_char( date_, 'YYYY-Q' )
order by 1, 2
;
SITE_ TO_CHAR(DATE_,'YYYY-Q') COUNT(*)
1 2020-1 1
1 2020-4 3
1 2021-1 1
1 2021-2 1
1 2021-3 2
2 2020-1 1
2 2020-2 1
GROUP BY ROLLUP
select site_, to_char( date_, 'YYYY-Q' ) q_, count(*)
from sitesanddates
group by rollup( site_, to_char( date_, 'YYYY-Q' ) )
order by 1, 2
;
SITE_ Q_ COUNT(*)
1 2020-1 1
1 2020-4 3
1 2021-1 1
1 2021-2 1
1 2021-3 2
1 - 8 -- <- subtotal for site 1
2 2020-1 1
2 2020-2 1
...
5 2020-4 2
5 2021-1 2
5 2021-2 1
5 2021-4 1
5 - 10 -- <- subtotal for site 5
- - 50 -- <- grand total