I am trying to get only the month of August in my database and then count how many times there has been a performance during August however I can't figure out how to do it.
I have given the code which I have created so far.
SELECT f.FILM_NAME, COUNT(p.PERFORMANCE_DATE), SUM(p.TAKINGS), p.PERFORMANCE_DATE
FROM A2_PERFORMANCE p, A2_FILM f
WHERE p.PERFORMANCE_DATE LIKE TO_DATE('08-2021', 'MM-YY')
GROUP BY f.FILM_NAME, p.PERFORMANCE_DATE
ORDER BY f.FILM_NAME
I am currently trying to achieve this:
-- FILM_NAME Performances Total Takings
-- --------------------------- ------------ ----------------------
-- It Happened One Night 39 £63,571
-- Modern Times 38 £58,332
-- Parasite 23 £37,195
-- Knives Out 22 £34,362
-- Citizen Kane 25 £32,711
-- The Wizard of Oz 18 £21,716
-- Avengers: Endgame 18 £17,081
CodePudding user response:
You can convert your dates to string and compare them to '08-2021' (not in the way you did - you must apply formatting to the dates themselves first), but that is inefficient.
You can also truncate the dates to the beginning of the month and compare to date '2021-08-01'
, but that is also inefficient.
"Inefficiency" comes from two sources, one smaller and one really big. The smaller one is having to apply functions to data from your table. The really big one has to do with indexing: if your queries often filter based on dates, then you would benefit from indexing the date column - but the index can't be used if your filters apply a function to the date column first.
So, how to do it? Best (especially if your "dates" may have time-of-day component other than midnight) is something like this:
...
where p.performance_date >= date '2021-08-01'
and p.performance_date < date '2021-09-01'
...
Note that it's always two inequalities, the first one is non-strict and the second is strict.
If instead you use performance_date between [Aug 1] and [Sep 1]
(in pseudo-code), this will give the wrong answer - this makes the second inequality also non-strict, so you will include performances with a date of Sept. 1, if they are saved in the db with a time-of-day of midnight. And between [Aug 1] and [Aug 31]
will miss everything on August 31 with a time-of-day OTHER THAN midnight.
Best not to use BETWEEN and instead to use two explicit inequalities, as I have shown.
CodePudding user response:
You are currently doing:
WHERE p.PERFORMANCE_DATE LIKE TO_DATE('08-2021', 'MM-YY')
You are providing a 4-digit year value (2021) but have only given YY in the format model, not YYYY. Now, Oracle by default is lenient about this sort of thing - unhelpfully so, some might say - so this will work in this case. But it's not good practice, and TO_DATE('08-2021', 'MM-YYYY')
would be better. Either way, that will give you midnight on the the first day of that month. You can provide that a bit more simply with a date literal: DATE '2021-08-01'
.
LIKE
is a pattern-matching condition, and compares strings, not dates. You are forcing an implicit conversion of both your column value and the fixed date you provided to strings, using your session's NLS_DATE_FORMAT setting. You also aren't including any wildcard characters, making it equivalent to =
. So with a common NLS setting, you are really doing:
WHERE TO_CHAR(p.PERFORMANCE_DATE, 'DD-MON-RR') = TO_CHAR(DATE '2021-08-01', 'DD-MON-RR')
With that NLS setting you would match any values on August 1st - though it would match data from 1921 etc. as well as 2021, if you had any. With a more precise NLS setting you might be doing:
WHERE TO_CHAR(p.PERFORMANCE_DATE, 'YYYY-MM-DD HH24:MM:SS') = TO_CHAR(DATE '2021-08-01', 'YYYY-MM-DD HH24:MI:SS')
which would only match values at exactly midnight on August 1st.
To match the whole month you could use an explicit mask, instead of the implicit conversion; and you only need to convert the column value, as you can supply the fixed value in the format you want anyway:
WHERE TO_CHAR(p.PERFORMANCE_DATE, 'YYYY-MM') = '2021-08'
or, if you prefer (I don't, but it's not my code):
WHERE TO_CHAR(p.PERFORMANCE_DATE, 'MM-YYYY') = '08-2021'
But that conversion will prevent a normal index on that column being used. I would suggest providing a date range to avoid any conversion:
WHERE p.PERFORMANCE_DATE >= DATE '2021-08-01'
AND p.PERFORMANCE_DATE < DATE '2021-09-01'
which will find all rows where that column is on or after midnight on August 1st, and before midnight on September 1st.
CodePudding user response:
If you are looking for all activities in august regardless of the year, you can convert your column to char of 'MON' and match it with 'AUG' :
SELECT f.FILM_NAME, COUNT(p.PERFORMANCE_DATE), SUM(p.TAKINGS),p.PERFORMANCE_DATE
FROM A2_PERFORMANCE p, A2_FILM f
WHERE TO_CHAR(p.PERFORMANCE_DATE,'MON') ='AUG'
GROUP BY f.FILM_NAME, p.PERFORMANCE_DATE
ORDER BY f.FILM_NAME