i am having a hard time to understand what indexes I should create.
I made this sample query that contains various situations (select, join, group, order etc..). What index/indexes should i create on this sample?
Table A: 2 gb in size
Table B: 100kb in size
SELECT A.AAA, A.BBB, A.CCC, B.mycol
From tableA as A
INNER JOIN tableB as B
ON A.ID = B.ID
WHERE AAA='3'
AND BBB>'2021-10-10'
AND CCC<'2021-11-01'
GROUP BY B.mycol, A.AAA, A.BBB, A.CCC
ORDER BY A.AAA desc
my understanding would be that i have to create one single inxed, with the clumns A.ID, A.AAA, A.BBB, and A.CCC. Table B does not need a index becuase it is small and wouldnt make any change.
is this correct? or do i need to create multiple indexes?
CodePudding user response:
You want to optimize execution time on the query:
SELECT A.AAA, A.BBB, A.CCC, B.mycol
From tableA as A
INNER JOIN tableB as B
ON A.ID = B.ID
WHERE AAA='3'
AND BBB>'2021-10-10'
AND CCC<'2021-11-01'
GROUP BY B.mycol, A.AAA, A.BBB, A.CCC
ORDER BY A.AAA desc
Since this query is filtering data using columns in tableA
only, then tableA
will be the driving table.
In the driving table we need to include the filtering columns, considering equality filters first, then non-equality filters, from highest to lowest selectivity. In this case:
- AAA
- BBB
- CCC
The GROUP BY
clause isn't doing anything, so we'll ignore it.
The index above will provide the rows in the order required by the ORDER BY
clause, that the engine will walk backwards. Therefore, there's no need to tweak the index for this purpose.
Finally, the engine will perform a nested loop to retrieve rows from tableB
. In order to do this efficiently the query will need and index by:
- ID
- mycol (optional, if we want a covering index for higher performance)
In short you'll need the following two indexes:
create index ix1 on tableA (AAA, BBB, CCC);
create index ix2 on tableB (ID);
Please consider the engine mat ignore them anyway, if the histograms of the table say otherwise.