Home > Enterprise >  Why do we need a primary key or unique to create a bitmap index
Why do we need a primary key or unique to create a bitmap index

Time:01-06

I tried to created a bitmap index as

CREATE BITMAP INDEX ...
ON ...(...)
FROM d1, d2
WHERE d1.AIRCRAFTID = d2.ID PCTFREE 0;

where the tables do not have a primary key nor unique values. After executing, I got the error ORA-25954. Altering the table and creating an unique constraint using an index it worked fine.

So, why do we need to do this?

CodePudding user response:

Bitmap join indexes require a primary key or a unique constraint on the dimension columns because duplicate rows would significantly change the way the bitmaps works. Allowing duplicates would either lead to a much less efficient data structure, require huge code changes for rare scenarios, or perhaps both. (Since we don't have access to Oracle's source code, this answer is mostly speculation. But based on the way that bitmap indexes work, I think these are reasonable guesses.)

First, the below example demonstrates how to create and use a simple bitmap join index, based partially on this Oracle-base article. The example creates a SALES fact table (that would usually be large), and a CUSTOMER dimension table (that is typically small). The bitmap index pre-joins those two tables, and stores the CUSTOMER.STATE value as a hidden column SYS_NC00004$ in the SALES table. The bitmap join index allows the query joining the tables to use the bitmap index and not access or join the CUSTOMER table at run time. We can see this happen when the BITMAP INDEX SINGLE VALUE operation in the explain plan uses the index IDX_CUSTOMTER_SALES instead of the CUSTOMER table.

--drop table customer;
--drop table sales;

create table sales(sales_id number primary key, dollar_amount number not null, cust_id number not null);
insert into sales values(1, 100, 1);
insert into sales values(2, 200, 2);

create table customer(cust_id number primary key, state varchar2(100) not null);
insert into customer values(1, 'CA');
insert into customer values(2, 'NY');

create bitmap index idx_customter_sales
on    sales(customer.state)
from  sales, customer
where sales.cust_id = customer.cust_id;

explain plan for
select /*  index(sales idx_customter_sales) */ sum(sales.dollar_amount)
from   sales,
       customer
where  sales.cust_id  = customer.cust_id
and    customer.state = 'CA';

select * from table(dbms_xplan.display);

--Plan:
Plan hash value: 477564610
 
------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                     |     1 |    26 |     6   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE                      |                     |     1 |    26 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| SALES               |     1 |    26 |     6   (0)| 00:00:01 |
|   3 |    BITMAP CONVERSION TO ROWIDS       |                     |       |       |            |          |
|*  4 |     BITMAP INDEX SINGLE VALUE        | IDX_CUSTOMTER_SALES |       |       |            |          |
------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - access("SALES"."SYS_NC00004$"='CA')
 
Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

Bitmap indexes create a new bitmap for every distinct value. Using the ROWID, which are physical addresses, Oracle only needs to use a single bit for whether each value matches for each table row. ROWIDs are internal, physical structures, and we don't know exactly how they work. But it's a safe guess that ROWIDs are ordered and that Oracle doesn't have to list the ROWID for every row. The index compresses the row numbers, probably with some metadata that indicates "this long list of bits is for ROWID range AAAftAAAMAAAK0kAAA to AAAftAAAMAAAK0kZZZ"

Logically, a bitmap looks something like this:

CUSTOMER.STATE   bitmap for SALES ROWID AAAftAAAMAAAK0kAAA - AAAftAAAMAAAK0kZZZ
--------------   --------------------------------------------------------------
CA               10...
NY               01...

Each bitmap is very small, since each table row only requires 1 bit plus a small amount of overhead. And if there are also bitmaps for other columns, Oracle can perform an AND predicate as a simple boolean CPU AND, which is super fast. However, if the column is not very unique, and there are many bitmaps, then the index will be enormous and inefficient.

Adding duplicate dimension rows ruins everything. For example, let's say we add a duplicate customer for California:

insert into customer values(3, 'CA');

Now the join between the two tables can return duplicates - one row in SALES could match multiple rows in CUSTOMER. Oracle can normally handle this situation easily, but I'm not sure if it can handle it with a bitmap index. A simple 0 or 1 is no longer enough information - the database also needs to know how many duplicates there are. Now we're storing counts instead of bits, something like this:

CUSTOMER.STATE   count(?) map for SALES ROWID AAAftAAAMAAAK0kAAA - AAAftAAAMAAAK0kZZZ
--------------   --------------------------------------------------------------
CA               20...
NY               01...

Now the data structure has changed from bits to bytes, or possibly even multiple bytes if there's a large number of duplicates. The tiny bitmap join indexes have now increased in size by a factor of at least eight, and they can no longer be used in fast binary math operations.

Allowing duplicates eliminates all the advantages of bitmap join indexes. And this is probably a very rare use case anyway. I'm not sure if it ever makes sense to have a one-to-many relationship between a fact table and a dimension table.

  • Related