Can you have too much cache?

Donald K. Burleson's picture

There is a great debate about the rapidly-falling costs of RAM and the performance benefits of full caching of Oracle databases. Let's take a closer look at the issues over large RAM data buffers, tuning by adjusting system parameters and using fast hardware to correct sub-optimal Oracle code:

Skeptic comment:

There is nothing wrong with disk I/O and full caching can hurt Oracle performance.

Have you noticed that all Oracle10g "world record" benchmarks use over 50 gig data caches? I worked with one of these TPCC benchmarks and ran repeatable timings. Up to the point where the working set was cached, the benefit of a larger data cache outweighed the LIO overhead.

Also, using multiple blocksizes also helped greatly and the appropriate blocksize was used depending on the types of objects in the tablespaces.

For example, small OLTP access rows like a 2k blocksize because you don't waste RAM hauling-in block space you don't need. A 32k tablespace for index range scans also showed a large, measurable performance improvement:

Skeptic comment:

The bigger your cache, the larger your LRU and Dirty List becomes.

This is true, but disk I/O is even more expensive! For example, check-out the Sun Oracle 10g benchmark:

Remember, the size of the buffer cache depends on the size of the "working set" of frequently-referenced data! A very large OLTP system might need very large KEEP pool.

Also, took into the mechanism inside 10g AMM. It computes the marginal benefits of additional data block buffers based of the costs of less PIO and does indeed consider the costs of serialization overhead.

Skeptic comment:

There comes a point where, for a particular 'unit' of hardware, the marginal cost of the increase in administrative overhead is greater than the marginal decrease in costs due to less physical I/O.

For reads, disk I/O is almost always shower then an LIO and full caching is great for read-only databases such as DSS, OLAP and DW systems! For write intensive database, a large cache can be a problem. That's why many DBA's place high DML objects in a separate tablespace (with a different blocksize), and map it to a smaller buffer.

Skeptic comment:

Why waste time adjusting initialization parameters and changing SQL optimizer statistics when the real problem is bad SQL?

When I visit a client I usually find thousands of sub-optimal SQL statements that would take months to manually tune. To get a head-start, I tweak the instance parms to broad-brush tune as much as possible to the lowest common denominator. Then I can sweep v$sql_plan and tune the exceptions. Tuning the instance first saves buckets of manual tuning effort and lowering optimizer_index_cost_adj will sometimes remove sub-optimal full-tables scans for hundreds of statement in just a few minutes. What's wrong with that?

Skeptic comment:

Throwing more memory at the problem does NOT make it go away.

I had a client last month will a REALLY messed-up database and it was HEAVILY I/O bound (85% read waits). Instead of charging $100k to fix the mess I replaced the disks to high-speed RAM-SAN (solid-state disk) for only $40k. The system ran 15x faster, in less than 24 hours, and the client was VERY happy. Granted, it's not elegant, but hey, why not throw hardware at lousy code if it's cheap and fast?

Like it or not, disk will soon be as obsolete as drums! I remember when 1.2 gig of disk costs $250k and I can now get 100 gig of SSD for only $110k.

I have several fully-cached clients (some using SSD and other with a huge KEEP pool), and their databases runs great. . .

I expect that Oracle Corporation will soon be working on a solid-state Oracle in a future release where the caches will disappear completely.

Of course, tuning the SQL is always the best remedy, and by reducing unnecessary consistent gets you often reduce PIO too! But it makes sense to me to tune at the system-level b y adjusting CBO parms (OIAC and optimizer_index_caching) and by improving the CBO statistics:

Skeptic comment:

Would you please explain why I have a database running at a BCHR of 99.9 and performance is still abysmal?

A high buffer cache hit ration with poor performance is often due to too many Logical I/O's (high consistent gets from sub-optimal SQL) or high library cache contention (excessive parsing).

Up to the point where the marginal benefit of adding blocks to the buffer declines (the second derivative of the 1/x curve of buffer utilization), tuning with BCHR makes sense. Beyond the point where the working set is cached, the marginal benefit from RAM data buffers declines.

Skeptic comment:

Throwing hardware at an Oracle performance problem addresses the symptom, and not the root cause of the performance problem.

True, but I've never had a problem throwing hardware (cache, faster CPU) at a messed-up system. My client's often demand it because they don't want to pay a fortune for SQL tuning. In many cases it's faster and cheaper. It makes sense.

There are many system-level "silver bullets" to Oracle tuning, and IMHO you would be foolish not to try them. For example, if you could tune 500 DSQL statements by adding an index or building an MV, why not? Consulting client demand that you tune the whole system before tuning individual SQL statements:

However, this will be a moot issue in 24 months when SSD makes those stupid magnetic platters obsolete.



I guess readers 20 years and 10 more Oracle versions from now will wonder about this kind of 'throwing hardware or wisdom at a problem dillema'.

I think it is obvious that one day all OLTP database will be cached and disk IO will be obsolete (persistent memory chips like M-systems for example).

This will eliminate most IO and IO-related tuning problems - leaving only the NP-complete like, much intresting problems for DBA's....

Merry Christmas for all!

Shimon Tourgeman,

Well, everyone agrees it's not elegant, but with hardware getting so cheap, many managers insist on a cheaper, faster hardware solution.

Nice article. I spend hours on your site!

I found this trying to fix a performance problem at a Spanish insurance company. It seems to be a network problem - so I'm tuning the listener.

I haven't read anything about the weakness of the Oracle passwords. I wrote a detailed article in my blog. It's in Spanish, but if you can't read it I could make a rapid translation: