Home > Enterprise >  What do indexed views actually index
What do indexed views actually index

Time:11-03

As I understand it SQL views represent 'a virtual table' where the data is actually held in other 'backing tables' the view references. Indexes can be added to views to improve performance, but what do these indexes actually reference if the view is just a virtual table? (is it a primary key on backing table or something?)

Imagine a view represented by SELECT * FROM bookings WHERE IsDeleted=0 with a index on bookings.AppointmentDate ... The index could possibly be ordered by appointment date (for easy searching) and each index leaf hold the row number of where that data is in the view ... That would work ... Until bookings changes and some deleted booking gets un-deleted now what the proposed index would hold would be miss-aligned.

Another way would be to have the indexed view now actually be 'doppleganger' of the bookings table, so its materialised and not virtual anymore. Now the index can refer to whatever primary key the doppleganger has, so nothing breaks when bookings get un-deleted. but again if the booking table changes, this doppleganger has to 'spot' new rows that it should have (like the un-deleted booking) and ones it needs to remove from itself before returning a result, wouldn't that be expensive on table updates negating the possible benefit of using the indexed view?

I'm trying to understand how indexed views really work under the hood.

CodePudding user response:

In SQL Server, an indexed view is a view that has been materialized into a clustered index, as another copy. This can often be better for performance.

Note that unlike Oracle, where a materialized view is periodically refreshed, SQL Server maintains the view's indexes together with any modifications to the base tables. This is done within the same statement, in a similar fashion to normal non-clustered index updates. It is not like a trigger, which executes in a separate scope, it directly feeds off the main query plan.

It can be a drag on DML performance such as updates and inserts, however it can substantially increase querying performance, especially for large aggregation queries. Note that aggregations only store the final aggregate result.

To improve performance, there are many limitations, mainly involving preventing the server from having to lookup other rows. So you can't do a LEFT or FULL join, or use aggregation constructs such as MAX or HAVING, and this is why you need COUNT_BIG on an aggregation.

CodePudding user response:

A virtual table is simply anything that can be queried like a table, but is not actually a table. This is mis-explained in many places on the internet so that it seems like "Virtual Table" is synonymous with "View", but this is misleading. Technically anything that can serve in-place of a table in a query is a Virtual Table, including;

  • Views (regular)
  • Indexed/Materialized Views
  • Table-valued Functions (regular)
  • Inline Table-valued Functions
  • Table Variables

But it also includes more ephemeral things such as:

  • Subquery expressions
  • CTEs (common table expressions)

And DBMS-specific things like:

  • OpenQuery expressions
  • Etc., etc.

So one of the important things about the list above is that I have listed Views twice, the "regular" type of views and "Indexed" views. This is because despite their seeming similarities, they are actually completely different in how they are implemented.

In almost all DBMSs regular Views are merely named aliases for a subquery expression. So when you define a view called MyView as some SELECT... and then later use that view like so:

SELECT * FROM MyView

The parser just replaces the View name with that view's SELECT statement as a subquery:

SELECT * FROM (SELECT...) AS MyView

That's it, that's all a regular View is. (This is why the claim that Views cause bad performance is so ludicrous, it's not the fact that it's a view, it's the poor use of subqueries).

Indexed Views however are a very different thing. First, the older and more generic term for them is (from Oracle and others) a Materialized View. SQL Sever just calls them Indexed Views because that is how you turn a regular view into a materialized view on SQL Sever: you add an index to it.

Unlike regular Views, Indexed/Materialized Views actually are what SQL beginners initially think all Views are: a pre-processed copy of the data from the underlying SELECT that defines the view. So it is actual data that the DBMS must dynamically maintain. Indexed Views on SQL Server are implemented and maintained under-the-hood as though they were an alternate clustered key of the base table. (Technically an Indexed View is a specific type of Materialized View.)

So to answer the original question: The index of an Indexed View does actually reference actual data, that is a copy of the original data in some table in the database.

CodePudding user response:

In SQL Server, indexed views store copies of all of the data for the view1, very much like another table. The difference is in how that data gets updated.

Effectively, the server generates triggers (but they don't show up as such) on each of the base tables to maintain the data in the view's clustered index. This is why there are many restrictions on what features an indexed view supports - they have to allow these "triggers" to be auto-generated to maintain the view data just based on the inserted and deleted pseudo-tables.

So, for instance, this is why you cannot use the MAX aggregate in an indexed view. Whilst you can write an efficient insert trigger to deal with any new values larger that the current MAX, update or delete triggers would potentially have to rescan the entire base table again, if the current max value were changed or removed.


1This means that if your query supports it, either because you've used the NOEXPAND hint or you're on Enterprise Edition and the optimizer likes it, a query can just use the data in the view without accessing the base table data at all.

CodePudding user response:

When you index any object, whether a view or a table, the data for the index is actually stored on disk. That is, an index is always materialized, even if the source is "virtual". If the index is for a table, then whenever you change the table the index must also be updated, along with any write operations this incurs.

The same is true for views; if you change any source table used in the view then the index must also be updated (there is a performance cost for changing data). This happens with the query (in the same context) and for views it can be complex. Therefore, in order to make it possible for the database to know when these index updates are necessary there are all kinds of restrictions on what you can do with a view if you want to have an index.

Finally, for the specific example in the question, rather than a view plus an index you can create a Filtered Index. This is an index directly on the table, but only for rows meeting the initial criteria.

  • Related