When deleting by rowid is horrendously slow?



The fastest way to access is a row is by rowid, isn't it...?

Well most of the time that would seem to be true, however we had a case yesterday where things went decidedly awry and what looks like the simplest statement you could imagine was having terrible performance problems.

This all started when investigating a problem with AQ, access to the main views based on the physical table were very slow and there seemed to be a huge backlog of messages building up - even though they had been processed by the dequeue process. The Qmon process seemed to be having trouble cleaning up.

Tracking down the queue session revealed the following very simple statement

delete /*+ CACHE_CB("BMRA_QUEUE_TAB") */ from
"POWER_DYNAMIC"."BMRA_QUEUE_TAB" where rowid = :1

That should be quick, right?

Well it was running but it was taking about 5 mins per row to run so we grabbed the explain plan it was using to see what was going on..

select * from table (dbms_xplan.display_cursor('2bf34haw7ydzq'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  2bf34haw7ydzq, child number 0
-------------------------------------
delete /*+ CACHE_CB("BMRA_QUEUE_TAB") */ from
"POWER_DYNAMIC"."BMRA_QUEUE_TAB" where rowid = :1

Plan hash value: 2788567916

-----------------------------------------------------------------------------------
| Id  | Operation        | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | DELETE STATEMENT |                |       |       |     1 (100)|          |
|   1 |  DELETE          | BMRA_QUEUE_TAB |       |       |            |          ||*  2 |   INDEX FULL SCAN| SYS_C00102668  |     1 |    22 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(ROWID=:1)

Hmm - an index full scan to delete something based on the rowid - very odd

The cost looks small though - whats going on?

Lets try a quick rebuild of the index to see if that resolves everything.

That just hangs though as its being blocked by the qmon process - lets try a different approach.

A quick look at the table stats showed there were 0 rows, however a count(*) from the table revealed there were 20 million rows - whay are the stats so wrong?

Ayway lets gather stats and see if that resolves it

begin
  dbms_stats.gather_table_stats(ownname => USER,tabname => 'BMRA_QUEUE_TAB',cascade => TRUE,degree => 4);
  end;

And we get this....


Why are they locked - what have the developers been doing?

Lets force the stats gathering

begin
  dbms_stats.gather_table_stats(ownname => USER,tabname => 'BMRA_QUEUE_TAB',force=>TRUE,cascade => TRUE,degree => 4);
  end;

We now get this... what? Seems the index that i cancelled the rebuild of has somehow got into a funny state...




Lets try just setting some random stats on the index manually

  begin
    dbms_stats.set_index_stats(ownname => USER,indname => 'SYS_C00102668',numrows => 10000,numlblks => 10000,force=>TRUE);
    end;

OK - thats run OK

lets see if the next execution picks up the stats change and starts working OK

And it doesn't - seems it stuck with the plan as it's within some plsql loop.

Let's see if we can force that SQL out of the shared pool. This can be done at an individual SQL level but i couldn't find the sql straight away to do that so i just used a big hammer

alter system flush shared_pool;

This still didn't work though as it was in use so wasn't flushed.

Let's kill the qmon process


kill session and now looks much better - plan is correct - see below


select * from table (dbms_xplan.display_cursor('2bf34haw7ydzq'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID  2bf34haw7ydzq, child number 0
-------------------------------------
delete /*+ CACHE_CB("BMRA_QUEUE_TAB") */ from
"POWER_DYNAMIC"."BMRA_QUEUE_TAB" where rowid = :1

Plan hash value: 1501167375

| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------------

|   0 | DELETE STATEMENT            |                |       |       |     1 (100)|          |

|   1 |  DELETE                     | BMRA_QUEUE_TAB |       |       |  |          |

|   2 |   TABLE ACCESS BY USER ROWID| BMRA_QUEUE_TAB |     1 |    29 |     1   (0)| 00:00:01 |

----------------------------------------------------------------------------------------------

Any now all looks good

Any we can see from this SQL the execution is now flying up

select executions from v$sqlarea where sql_id='2bf34haw7ydzq';

So seems the problem is fixed ( at least for now) - but what's going on with the whole locked stats thing? A quick google reveals this - which then implies the DBA hasn;t been doing there job properly... :-)

The following information is in:
Oracle® Streams Advanced Queuing User's Guide 11g Release 1 (11.1) (and also: 11g Release 2 (11.2)), 5 Oracle Streams AQ Performance and Scalability:

"Ensure that statistics are being gathered so that the optimal query plans for retrieving messages are being chosen. By default, queue tables are locked out from automatic gathering of statistics. The recommended use is to gather statistics with a representative queue message load and lock them."

Comments