Index compression confusion



Once again I've been looking into compression to save us disk space across out huge number of test environments. I'd been looking at one table in particular (and it's 4 indexes) which take up about 35% of each of out test environments.

The table is pretty large ~600 million rows, it contains 'normal' datatypes (so no lobs or user defined objects)

The objects and there sizes are shown below

RESET   (the table)                                                                               48.2119141
EON_RESET_IDX                                                                               16.0449829

RESET_INDEX                                                                                    14.1956787
RESET_INDEX2                                                                                  14.7709351
RESET_INDEX3                                                                                  12.7185669

So in total you can see that adds up to 104GB

Advanced compression done on the table brought that down to 18GB

then i moved on to the indexes and this is where it got very confusing....

 I did a rebuild of all the indexes just adding the keyword 'compress' to each one

i.e.

alter index index_name rebuild compress;

This took a while, once it had finished a then checked the index sizes

EON_RESET_IDX                                                                                 11.1334229
RESET_INDEX2                                                                                  18.0892334
RESET_INDEX                                                                                   8.06481934
RESET_INDEX3                                                                                  6.37780762

At first glance that looks ok, but hold on whats going on with RESET_INDEX2 - that's got bigger!

That makes no sense (at least it didn't to start with). perhaps the data is not that repetitive so doesn't compress well. Lets have a look at the actual data and see whats going on.

The data looked normal enough, 2 of the columns are just 1's or 0's so i could see that that would really compress well.

Then i noticed that one of the other indexes (that did get smaller) had practically the same definition as the one that got bigger - they only differed by one column. Looking at the data however and the 'different' column contain practically the same data in both cases. This now makes even less sense.....

Then i noticed something else, the 'smaller' index was defined as unique but the bigger one was not...

Hmm what's going on here - is there some fundamental difference in how compression works for unique vs non unique index data. Does a unique index somehow allow compression of the rowid somehow?

At this point i posted a question on the technet and Jonathan pointed me to a very useful link on his site

http://allthingsoracle.com/compression-in-oracle-part-4-basic-index-compression/

Which also linked me to Richards site where i found this link

http://richardfoote.wordpress.com/2008/02/22/index-compression-part-iii-225/

Here i found the key statement that turned the light bulb on

"If the Unique index has multiple columns, the default prefix length value (number of compressed columns) is the number of indexed columns minus one, not all columns as it is for a Non-Unique index. See, Oracle is doing its best here to prevent a useless attempt at index compression."

So the difference is for my 4 column indexes when i say this

create index index_name on (col1,col2,col3,col4) compress;

it tries to compress all the first 4 columns

when i say

create unique index index_name on (col1,col2,col3,col4) compress;

oracle then realises there is no point trying to compress all 4 columns - they are unique - key compression won;t work, what I'm going to do instead is compress all but the last column - so the statement becomes

 create unique index index_name on (col1,col2,col3,col4) compress 3;

Indeed if i now try

 create index index_name on (col1,col2,col3,col4) compress 3;

the non-unique index ends up being the same 'small' size.

This now all makes perfect sense - but it's important to note the difference in the way compressions works differently between indexes and tables. Jonathan and Richard explain it in much more detail than i could and it's a useful read. The key thing to take away from this though is dont just blindly use compress for indexes - know your data and choose the correct number of key columns so that compression is actually useful

Comments

Post a Comment