Part 4 - The Bitmap Legacy

articles: 

  1. Myth #1: Bitmap indexes are good for low-cardinality columns
  2. Myth #2: Bitmap indexes are slow to update
  3. Myth #3: Bitmap indexes don't support concurrent updates
  4. Myth or Just Misleading?
  5. Appendix C – Impact of Transactional Updates on Bitmap Indexes
  6. Appendix D – Impact of Set-Based (Bulk) Updates on Bitmap Indexes

Myth #1: Bitmap indexes are good for low-cardinality columns

This is one of those myths that is probably best demonstrated by example. First, let’s create some test data: 10 million rows with the following columns:

  • COL1 – Alternating values of 1 and 0 – not indexed
  • COL2 – Identical to COL1 – bitmap indexed
  • PAYLOAD – 100 bytes of X’s to make the rows a realistic width – not indexed

COL1 and COL2 are identical low cardinality columns: there are only two valid values. We will table a SQL that finds all of the matching rows on one value (50% of the table) and then see if a bitmap index improves performance. According to the myth, a Bitmap Index is a good candidate for a column with just 2 values.

First, we set up 10 million rows of test data and create a bitmap index.

SQL> CREATE TABLE card_test
  2  AS
  3  WITH rgen AS (
  4  	 SELECT ROWNUM AS col
  5  	 FROM	dual
  6  	 CONNECT BY LEVEL <= 10000
  7  )
  8  SELECT mod(ROWNUM, 2) AS col1
  9  ,	    mod(ROWNUM, 2) AS col2
 10  ,	    LPAD('X', 100, 'X') AS payload
 11  FROM rgen r1
 12  CROSS JOIN rgen r2
 13  WHERE ROWNUM <= 10000000;

Table created.

SQL> CREATE BITMAP INDEX card_test_col2_bx ON card_test(col2);

Index created.

Now let’s get a baseline query without using the Bitmap index. To keep the playing field level, I shut down the database server first to empty the Buffer Cache and the physical disk cache.

SQL> SELECT MAX(payload)
  2  FROM card_test
  3  WHERE col1 = 0
  4  /


Elapsed: 00:00:07.65

------------------------------------------------
| Id  | Operation          | Name      | Rows  |
------------------------------------------------
|   0 | SELECT STATEMENT   |           |     1 |
|   1 |  SORT AGGREGATE    |           |     1 |
|*  2 |   TABLE ACCESS FULL| CARD_TEST |  4131K|
------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("COL1"=0)

Statistics
----------------------------------------------------------
        346  recursive calls
          0  db block gets
     153978  consistent gets
     154009  physical reads
          0  redo size
        582  bytes sent via SQL*Net to client
        415  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          1  rows processed

The key statistics for the baseline are:

  • Execution time: 7.65 seconds
  • Logical IO: 153,978 buffers

Now let’s try the Bitmap Index. Once again, we bounce the server to clear the caches.

SQL> SELECT MAX(payload)
  2  FROM card_test
  3  WHERE col2 = 0
  4  /


Elapsed: 00:00:34.35

-----------------------------------------------------------
| Id  | Operation                     | Name              |
-----------------------------------------------------------
|   0 | SELECT STATEMENT              |                   |
|   1 |  SORT AGGREGATE               |                   |
|   2 |   TABLE ACCESS BY INDEX ROWID | CARD_TEST         |
|   3 |    BITMAP CONVERSION TO ROWIDS|                   |
|*  4 |     BITMAP INDEX SINGLE VALUE | CARD_TEST_COL2_BX |
-----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("COL2"=0)

Statistics
----------------------------------------------------------
        350  recursive calls
          0  db block gets
     154205  consistent gets
     154237  physical reads
          0  redo size
        582  bytes sent via SQL*Net to client
        415  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          1  rows processed

Note that both the Logical and Physical IO are about the same as the baseline, but the execution time is over 4 times slower using the Bitmap Index. What has happened?

The problem is not with the bitmap index. It is true that bitmap indexes – even those on low-cardinality columns – can be accessed very quickly because their compression and brevity (one bit per row) results in relatively little IO.

The real problem is the table lookup (Step 2 in the execution plan). TABLE LOOKUP BY INDEX ROWID is a single block operation; for each row requested by the index, Oracle checks if the block is in the Buffer Cache and – if not – performs a round trip to retrieve just that one block from disk.

Conversely, the TABLE ACCESS FULL operation (Step 2 of the baseline execution plan) performs a multi-block read. For each row requested, Oracle checks if the block is in the buffer cache and – if not – performs a round trip to retrieve it and the (configurable, but in this case) 88 blocks that follow it. Since these following blocks are now in the Buffer Cache – and since the Full Table Scan will process them in order – we won’t need to go back to disk until they are all processed.

Even though the Full Table Scan performed about the same amount of Logical and Physical IO as the Bitmap Index Scan, it performed fewer round-trips to disk.

As a result of this single-block retrieval penalty, Bitmap Index execution plans are subject to the same performance restriction as B-Tree index plans: if you are going to read a significant percentage of the table, a Full Table Scan will out-perform indexed access. Placing an actual figure on “significant percentage” is difficult because it varies depending on too many factors, but in my experience it is rare for an indexed read of 10% of the table to out-perform a full scan.

So where does this leave our myth; that bitmap indexes are good for low-cardinality columns?

  • Bitmap indexes on low-cardinality columns can be stored efficiently.
  • Bitmap indexes on low-cardinality columns can be accessed efficiently.
  • Queries on low-cardinality bitmap indexed columns must be combined with other bitmap access WHERE predicates, otherwise they will very likely be slower than a full table scan.
  • COUNT(*) and COUNT(DISTINCT) queries are an exception. If these queries can be resolved without a table-access then the bitmap index access will be very efficient regardless of cardinality.

If we were looking for a general rule, Bitmap Indexes are most advantageous on tables with several bitmap indexed columns that are frequently used in AND and OR combinations to return small proportions of data in the table.

Myth #2: Bitmap indexes are slow to update

We’ve already come a small way to confirming this myth as true in Part 2 when we discovered that there are some overheads to updates on bitmap indexed columns that can vary with the size of the table. In fact, there is a lot of truth to this myth; the refutation is more of an exception rather than a rule.

First, let’s demonstrate that bitmap indexes are slow to update. We’ll create a table with two identical columns – one B-Tree indexed and one Bitmap indexed – and then perform 100,000 identical updates on them. The results are documented in Appendix C.

As you can see, there is a considerable difference in the cost of updating a Bitmap Indexed column compared to a B-tree indexed column. There is a 2x difference in run-time and a 15x difference in logical IO. On this basis it is safe to say that transactional updates on Bitmap indexed columns are considerably slower than transactional updates on B-Tree indexed columns.

But let’s now look at a different update approach: a set-based update rather than a transactional update. Instead of updating 100,000 rows in 100,000 updates, we will update them all in a single update. See Appendix D for the results of the experiment.

Now this is a much more interesting result. The cost difference between the two methods is not significant, suggesting that the overheads of transactional updates on bitmap indexes are not present using set-based updates. Or more likely, the overheads are still there but they apply at a statement-level, not at the row-level.

This has important ramifications for Data Warehouse ETL. Bitmap indexes are highly recommended for fact tables in a data warehouse. Many fact tables are insert-only in nature and can be loaded with Direct-Path techniques that are not subject to these transactional overheads. One common type of fact table – known as an Accumulating Snapshot – tracks things like Orders through their lifecycle (created, approved, shipped, closed). Each row in an accumulating snapshot fact is created once and then updated many times, so it is not really possible to use Direct-Path loads. An ETL job on an accumulating snapshot will perform a large number of updates.

If we write our own ETL code in SQL or PL/SQL then we can ensure that we perform updates using a bulk set-based approach (e.g. using MERGE) but if we are using an ETL tool then it is not obvious how UPDATEs are applied because the underlying database code is often hidden from view. For example, ETL industry leaders DataStage and Informatica both perform transactional updates – these tools would have dire performance problems updating a heavily bitmap indexed table.

If you are performing non-direct-path loads on a table with bitmap indexes, make sure you know how your tool applies updates. If it is transactional (and it probably is), then you may need to disable bitmap indexes during the load or perhaps load to a temporary table and then apply changes to the bitmap indexed table with a MERGE statement.

Myth #3: Bitmap indexes don't support concurrent updates

Again, there is so much truth to this one that it is another case where the counter-example I am about to document is more the exception rather than the rule.

So let’s be clear right up front: Bitmap indexes suffer from locking problems; if two separate sessions attempt to modify a bitmap indexed column, blocking or deadlocks will frequently occur – even if they are modifying different rows.

The reason goes back to what we learned in Part 2: an Index Entry of a Bitmap Index contains an index “Piece”. The Piece contains a bitmap covering many table rows. A DML insert/update/delete on one of those rows will lock the entire piece, thereby blocking any other statement that would affect the same piece. The compression efficiency of bitmap indexes means a single piece can effectively lock hundreds of rows, so blocking or deadlocking in a multi-session environment is almost certain.

To demonstrate how simple it is to block or deadlock with bitmap indexes, the following case uses only INSERT statements.

First, create an empty table with a bitmap index:

SQL1> CREATE TABLE blocker (
  2  	 col1 number
  3  );

Table created.

SQL1> CREATE BITMAP INDEX blocker_col1 ON blocker(col1);

Index created.

Now insert a row with column value 1, but don’t commit:

SQL1> INSERT INTO blocker VALUES (1);

1 row created.

In a separate session we insert another new row with a different value (2); in fact we can do it twice to demonstrate that this is not a Unique index/key issue. This succeeds, but when we try to insert a row with value 1 (same value as in the first session) the insert is blocked:

SQL2> INSERT INTO blocker VALUES (2);

1 row created.

SQL2> INSERT INTO blocker VALUES (2);

1 row created.

SQL2> INSERT INTO blocker VALUES (1);
_ 

Back in the first session, if we now try to insert value 2 it will be blocked by the 2nd session.

SQL1> INSERT INTO blocker VALUES (2);
_

But look at what just happened in the 2nd session: since it was already blocked by the 1st session, Oracle detects the deadlock and breaks it by failing session2.

SQL2> INSERT INTO blocker VALUES (1);
INSERT INTO blocker VALUES (1)
            *
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource

Well, this seems pretty conclusive (and it is): if separate sessions concurrently maintain a bitmap indexed table, then to avoid deadlocks they must commit after every row. This is not the same as committing after every statement, because a single INSERT, UPDATE or DELETE statement could affect many rows and therefore lock many index entries. Also, this does not achieve true concurrency: uncommitted transactions will still block incoming transactions; they just won’t deadlock providing they are committed row-by-row.

Note that this advice conflicts with that from Myth 2 where we determined that set-based (multi-row) DML was superior to transactional (single-row) DML on bitmap indexed tables because of the statement-level bitmap index maintenance overheads. For performance, we need to use multi-row DML; but for concurrency we need to use single-row DML.

This is especially concerning for Data Warehouses, where we are most likely to need Bitmap Indexes. Due to the overheads of transactional DML, we recommended above that set-based DML (such as MERGE) is preferred for loading data. But now we see that those performance gains may be lost because we cannot perform concurrent loads using set-based DML because of blocking issues.

Parallelism is one of the greatest tuning tools available to us for bulk data loading; it is now supported natively by the leading ETL tools like Informatica and DataStage. Does this mean we need to switch off parallelism in our ETL tools and serialise loads into bitmap indexed tables?

Fortunately (and this is where the “myth” comes in), there is a way to parallelise loads into Bitmap Indexed tables: Parallel DML. While two separate MERGE statements cannot concurrently load a bitmap indexed table without the risk blocking and deadlocks, a single MERGE statement can be parallelised to achieve the performance gains of concurrency.

What does this mean for ETL programmers? It means that you need to collect all of your change-data into a single place (such as a temporary table) and then apply that change-data to the bitmap indexed table using a single parallel-DML INSERT, MERGE, or DELETE statement.

Here is an example of a Parallel DML MERGE statement that updates 90000 rows and inserts 10000. Note from the TKPROF stats that it was executed 5 times; this indicates 4 parallel threads and one controller thread meaning that we have achieved concurrent maintenance of the bitmap index without any blocking/deadlock issues.

Also note that the CPU and Elapsed figures are cumulative over all parallel threads. In fact, this MERGE took an elapsed time of around 30 seconds with 4 parallel threads, not 168 seconds. The same MERGE run without parallelism (not displayed here) took over 1 minute. It’s worth pointing out that my database server is running on a virtual machine from a USB 3.0 flash drive on my i5 Ultrabook – we should not expect impressive parallel query performance.

ALTER SESSION ENABLE PARALLEL DML;

MERGE /*+ parallel */ INTO merge_test old
USING (
    SELECT /*+ full(a) */ key + 1000000 key
    ,      col1 + 500 col1
    FROM   merge_test a
    WHERE  col1 < 10
) new
ON (old.key = new.key)
WHEN NOT MATCHED THEN INSERT VALUES (
    new.key
,   new.col1
,   lpad('Y', 100, 'Y')
)
WHEN MATCHED THEN UPDATE
SET col1 = new.col1
,   payload = lpad('Y', 100, 'Y')

call     count       cpu    elapsed       disk      query
------- ------  -------- ---------- ---------- ----------
Parse        2      0.01       0.01          0         14
Execute      5     23.09     168.38     334142     647191
Fetch        0      0.00       0.00          0          0
------- ------  -------- ---------- ---------- ----------
total        7     23.10     168.39     334142     647205

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  MERGE  MERGE_TEST
 100000   PX COORDINATOR
      0    PX SEND QC (RANDOM) :TQ10002
      0     VIEW
      0      HASH JOIN OUTER BUFFERED
      0       PX RECEIVE
      0        PX SEND HASH :TQ10000
      0         PX BLOCK ITERATOR
      0          TABLE ACCESS FULL MERGE_TEST
      0       PX RECEIVE
      0        PX SEND HASH :TQ10001
      0         PX BLOCK ITERATOR
      0          TABLE ACCESS FULL MERGE_TEST

The learning here for ETL developers is that you can load (insert and update) bitmap indexed tables in parallel without locking issues, but it is likely that you will need to write the load yourself if your parallel ETL tool performs transactional updates rather than parallel DML MERGE.

Myth or Just Misleading?

So there we go: 3 Bitmap Index myths busted. There was a bit of truth to each one, so I leave it up to you to decide whether to perpetuate the myths because they are almost right. Bitmap Indexes are just too valuable to dismiss, so next time you hear someone advising not to use Bitmap Indexes because the cardinality is too high / they are too slow to update / they don’t support concurrent updates; hopefully you will step in and present the whole truth.

Return to main article




Appendix C – Impact of Transactional Updates on Bitmap Indexes


Here we compare the impact of repeated updates on a Bitmap Indexed column vs a B-Tree Indexed column. The test data is 10 million rows with two identical columns containing evenly distributed values between 0 and 999. We update all values < 10 (100,000 of them).

The TKPROF extract below is the update on the B-Tree indexed column. It took just 39.3 seconds to perform 100,000 updates.

UPDATE CARD_TEST SET COL1 = COL1 + 500 
WHERE
 ROWID = :B1 


call     count       cpu    elapsed       disk      query
------- ------  -------- ---------- ---------- ----------
Parse        1      0.00       0.00          0          0
Execute 100000     10.60      39.30      12578     100876
Fetch        0      0.00       0.00          0          0
------- ------  -------- ---------- ---------- ----------
total   100001     10.60      39.30      12578     100876


Rows     Row Source Operation
-------  ---------------------------------------------------
      0  UPDATE  CARD_TEST (cr=1 pr=4 pw=0 time=0 us)
      1   TABLE ACCESS BY USER ROWID CARD_TEST (cr=1 pr=1 pw=0 time=0 us cost=1 size=25 card=1)

For the bitmap test, I rebooted the database server to clear the Buffer Cache and disk cache. The update on the bitmap indexed column took over 2 times longer. Although it performed about the same amount of Physical IO as the B-Tree example above, it performed 15-times as much logical IO.
The precise cause of this extra logical IO is not clear, but it is obviously related to the maintenance of the bitmap index.

UPDATE CARD_TEST SET COL2 = COL2 + 500 
WHERE
 ROWID = :B1 


call     count       cpu    elapsed       disk      query
------- ------  -------- ---------- ---------- ----------
Parse        1      0.00       0.00          0          0
Execute 100000     35.12      84.94      12212    1511257
Fetch        0      0.00       0.00          0          0
------- ------  -------- ---------- ---------- ----------
total   100001     35.12      84.94      12212    1511257

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  UPDATE  CARD_TEST (cr=15 pr=4 pw=0 time=0 us)
      1   TABLE ACCESS BY USER ROWID CARD_TEST (cr=1 pr=0 pw=0 time=0 us cost=1 size=25 card=1)


Appendix D – Impact of Set-Based (Bulk) Updates on Bitmap Indexes


In this example we test the impact of a single large update to a Bitmap Indexed column when compared to a B-Tree indexed column. The test data is 10 million rows with two identical columns containing evenly distributed values between 0 and 999. We update all values < 10 (100,000 of them).

The TKPROF extract below is the update on the B-Tree indexed column. It took just 15.31 seconds to update 100,000 rows in a table of 10 million.

UPDATE /*+ FULL(c) */ card_test c
SET col1 = col1 + 500
WHERE col1 < 10

call     count       cpu    elapsed       disk      query
------- ------  -------- ---------- ---------- ----------
Parse        1      0.00       0.00          2          3
Execute      1      3.87      15.30     159491     159340
Fetch        0      0.00       0.00          0          0
------- ------  -------- ---------- ---------- ----------
total        2      3.87      15.31     159493     159343

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  UPDATE  CARD_TEST (cr=159420 pr=159491 pw=0 time=0 us)
 100000   TABLE ACCESS FULL CARD_TEST (cr=158485 pr=158404 pw=0 time=4835757 us cost=43266 size=656630 card=50510)

Once again, the Bitmap test was performed after a reboot to clear the caches. At 16 seconds it is less than a second slower than the B-Tree indexed version above and there is only a small difference in logical IO. When compared to the transactional update example in Appendix C, this demonstrates that a bulk, set-based update can be performed on bitmap-indexed columns without significant performance impact.

UPDATE /*+ FULL(c) */ card_test c
SET col2 = col2 + 500
WHERE col2 < 10

call     count       cpu    elapsed       disk      query
------- ------  -------- ---------- ---------- ----------
Parse        1      0.01       0.01          2          3
Execute      1      3.28      15.99     159015     161526
Fetch        0      0.00       0.00          0          0
------- ------  -------- ---------- ---------- ----------
total        2      3.29      16.00     159017     161529

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  UPDATE  CARD_TEST (cr=161560 pr=159017 pw=0 time=0 us)
 100000   TABLE ACCESS FULL CARD_TEST (cr=158485 pr=158367 pw=0 time=7165529 us cost=43277 size=656630 card=50510)



Return to main article