I am trying to work out what columns and a combination of indexes may be useful for optimising this query. I think wine.vid, vineyard.vid, wine.cid, class.cid and grade would be the important columns but I am unsure about how to go about generating the combined indexes.
Select vintage, wine_no, wname,
pctalc, grade, price, wine.vid, vname, wine.cid, cname
from vineyard, class, wine
where wine.vid = vineyard.vid
and wine.cid = class.cid
and wine.cid = 'SHIRAZ'
and grade = 'A';
Here are what the tables look like in case that's important.
CREATE TABLE VINEYARD
(VID CHAR(12) NOT NULL,
PID CHAR(08) NOT NULL,
RID CHAR(12) NOT NULL,
VNAME CHAR(45) NOT NULL,
VMANAGER CHAR(45) NOT NULL,
COMMENTS CHAR(300),
PRIMARY KEY (VID),
FOREIGN KEY (PID)
REFERENCES PRODUCER,
FOREIGN KEY (RID)
REFERENCES REGION);
CREATE TABLE WINE
(VINTAGE NUMBER(4) NOT NULL,
WINE_NO INTEGER NOT NULL,
VID CHAR(12) NOT NULL,
CID CHAR(10) NOT NULL,
PCTALC NUMBER(4,2),
PRICE NUMBER(6,2),
GRADE CHAR(01) NOT NULL,
WNAME CHAR(40) NOT NULL,
COMMENTS CHAR(300),
PRIMARY KEY (VINTAGE,
WINE_NO),
FOREIGN KEY (VID)
REFERENCES VINEYARD,
FOREIGN KEY (CID)
REFERENCES CLASS);
CREATE TABLE CLASS
(CID CHAR(10) NOT NULL,
CNAME CHAR(45) NOT NULL,
COLOUR CHAR(05),
PRIMARY KEY (CID));
CodePudding user response:
Although a basic example for your needs, I try to explain that indexes should be optimized first for the where and join conditions. Secondarily on any group by. Last is Order by. By having multiple indexes on a single table by their individual columns does not really help as much as a better constructed. Also note, it is best to always include table.column
or alias.column
so nobody has to guess which table a column is coming from. Using aliases that closely match context of the table's purpose also helps shorten writing and readability. Ex: "W" for "Wine", "V" for Vineyard, "C" for Class.
Select
w.vintage,
w.wine_no,
w.wname,
w.pctalc,
w.grade,
w.price,
w.vid,
v.vname,
w.cid,
c.cname
from
wine w
JOIN vineyard v
on w.vid = v.vid
JOIN class c
on w.cid = c.cid
where
w.grade = 'A'
and w.cid = 'SHIRAZ'
So, now lets look at the indexing. Since Wine is the primary table and the vineyard and class are secondary lookups, lets look at the WHERE first... You are filtering on WINE for the class 'SHIRAZ' AND grade 'A', these will BOTH be first positions in the index. Next is your join conditions to the lookup tables, so I would include those as well. By having these components in the index, the engine can use them instead of having to go to the actual data pages before doing the join. Only when all parts are covered does it actually go to the data pages to retrieve what is qualified for. So your index should be
Table Index
Wine (grade, cid, vid )
vineyard (vid, vname )
class (cid, cname )
For the secondary lookup tables, you can see the primary field (first position) of the index is the ID of the table, but since you are only grabbing the name description from that table, having a "covering" index will help in the same way. Again, the engine will never need to go back to the raw data pages.
I'm sure you dont have millions of records like some larger databases, but understanding the context and optimization approach is still the same.
CodePudding user response:
For beginning, create 5 indexes as
create index i1_wine_vid on wine (vid);
create index i2_wine_cid on wine (cid);
create index i3_wine_gra on wine (grade);
create index i1_vine_vid on vineyard (vid);
create index i1_clas_cid on class (cid);
Collect statistics for tables and indexes, check explain plan, run the query and see how it behaves.
Not that it matters much regarding your question, but: you should always use table aliases when referencing columns. Try to avoid naming all tables in FROM
clause, separated by comma and then joining them in WHERE
clause - use JOIN
s, leave WHERE
for conditions. Something like this:
SELECT w.vintage,
w.wine_no,
w.wname,
w.pctalc,
w.grade,
w.price,
w.vid,
v.vname,
w.cid,
c.cname
FROM vineyard v
JOIN wine w ON w.vid = v.vid
JOIN class c ON c.cid = w.cid
WHERE w.cid = 'SHIRAZ'
AND w.grade = 'A';