Feed aggregator

The old old coolness - HTMX | Execute HTTP request directly from HTML tag

Andrejus Baranovski - Sat, 2022-08-06 13:47
Developing a modern Web app with Django and Tailwind? Thinking of using React, Vue, or another fancy JS library for UI implementation? I would recommend spending some time and trying HTMX. With HTMX you can execute partial requests and return HTML content to replace web page regions. HTMX properties are defined directly on HTML tags, this makes it easy to use and flexible.



The Feature - Fri, 2022-08-05 20:53






Categories: APPS Blogs


Marian Crkon - Fri, 2022-08-05 20:53

Indexes with or without TRIM

Tom Kyte - Fri, 2022-08-05 15:46
If I have an index, for example, by CD_CLIENT, CD_COMMOD, CD_MERCAD, and there is a query that uses them with TRIM. So: ....... <code>WHERE TRIM( CUSTOMER_CD ) = ... AND TRIM(CD_COMMOD) = ... AND TRIM(CD_MARKET) = ...</code> Should I make another index that has the TRIM ? Or is that enough? Thank you very much!!
Categories: DBA Blogs

Your suggestions on OLAP technology

Tom Kyte - Fri, 2022-08-05 15:46
We are evaluating OLAP technology for implementing some analytics for our product. We are reviewing technologies from Oracle, Microsoft and other vendors. Correct my if I am wrong, OLAP technology with Oracle looks like an early stage feature. I got this assumption when comparing the features, ease of use etc with other vendors. Even Microsoft also well in advance in that area. I couldn?t see much customer references for Oracle OLAP. Please comment on the following issue as you have more expertise with the different technologies and industry? What are your suggestions on choosing OLAP technology? What are your suggestions on Oracle OLAP? Should we consider it? If yes, why?
Categories: DBA Blogs


Tom Kyte - Fri, 2022-08-05 15:46
Hello, I inherited from a schema that is using nvarchar2. I am trying to recreate that schema but I am struggling to create table with nvarchar2 that have a size of 4000 (> 2000). I have changed the max_string_size to "extended" but it does not works. I works varchar2 with e.g. 32676 bytes. Here are the operation I made. <code> sqlplus / as sysdba ALTER SESSION SET CONTAINER=CDB$ROOT; ALTER SYSTEM SET max_string_size=extended SCOPE=SPFILE; shutdown immediate; startup upgrade; ALTER PLUGGABLE DATABASE ALL OPEN UPGRADE; EXIT; cd $ORACLE_HOME/rdbms/admin mkdir -p /home/oracle/tmp/log_utl32k $ORACLE_HOME/perl/bin/perl $ORACLE_HOME/rdbms/admin/catcon.pl -u SYS --force_pdb_mode 'UPGRADE' -d $ORACLE_HOME/rdbms/admin -l '/home/oracle/tmp/log_utl32k' -b utl32k_cdb_pdbs_output utl32k.sql sqlplus / as sysdba shutdown immediate; startup; ALTER PLUGGABLE DATABASE ALL OPEN READ WRITE; EXIT; mkdir -p /home/oracle/tmp/utlrp $ORACLE_HOME/perl/bin/perl $ORACLE_HOME/rdbms/admin/catcon.pl -u SYS --force_pdb_mode 'READ WRITE' -d $ORACLE_HOME/rdbms/admin -l '/home/oracle/tmp/utlrp' -b utlrp_cdb_pdbs_output utlrp.sql sqlplus / as sysdba create table testvc (name varchar2(32676)); >>Table created. create table testnvc (name nvarchar2(4000)); >>ORA-00910: specified length too long for its datatype </code> I have checked several resources but I cannot find a solution. Any suggestion are welcomed.
Categories: DBA Blogs

drop partition

Jonathan Lewis - Fri, 2022-08-05 14:24

This note is about some testing I did on the consequences of the (new in 12c) “deferred global index maintenance” feature that Oracle introduced as a strategy to reduce the impact of dropping partitions from a partitioned table.

Looking at my notes I see that created my first test in August 2013 on Oracle – probably after reading Richard Foote’s series on the topic.

At the time I didn’t turn my notes into blog post but a recent request on the MOS Community Forum (needs a MOS account) prompted me to revisit and extend the tests using 19c.

  1. The Request
  2. The Background
  3. The Model
  4. Tests and Results
  5. Deep Dive
    1. dbms_space
    2. Tree dumps
    3. Dumping Redo
    4. Transactions and locking
  6. Summary
  7. Footnote

The Request

The database is 19.3, two-node RAC with a standby (type and function not specified). There is a table range-partitioned by month holding nearly three years of data. The table size is about 250GB and the OP wants to drop the partitions older than one year.

There was an issue doing this on some other environment that when running the daily maintenance windows “it consumes a lot of CPU but I could not find a link between both activities”.

Is there any thing that could be done to avoid any db impact especially as this is a “production 24” environment?

There was no comment about how many indexes there were (and that’s an important detail), nor how many were global, globally partitioned, or local (also important details), but there was a comment that the pmo_deferred_gidx_main_job was run immediately after the drop, generated a lot of redo, and was still running after 10 hours – so it’s reasonably safe to assume that global indexes had a big impact since that’s the “partition maintenance operations deferred global index maintenance” job.

From the comment about the system being “production 24” I assume that the target is to come up with a strategy that doesn’t deny access to the users for a few hours, has the least possible impact on what they normally do, and doesn’t require the standby database to be (partially) rebuilt / unavailable.

Since this is Oracle 19c and the OP wants to drop nearly two-thirds of the data (i.e. significantly more than he’s keeping) the “obvious” strategy to investigate is dropping the partitions (online, with update indexes) then rebuilding the global (or globally partitioned) indexes (online).

At a minimum it would be sensible to do some modelling to get some idea of why the other system spent so much time in pmo_deferred_gidx_main_job as this might allow you to work out either that it wouldn’t be a problem in this system, or that there was a variation on the method that would be better, or that you just don’t want to use the job because you’ve got a good idea of just how nasty it would be.

The Background

Deferred index maintenance means that global index maintenance does not take place when you drop a partition. Historically Oracle would, as part of the drop, delete every single index entry for the dropped partition from every single global index – doing a lot of work and taking a lot of time at the moment of the drop. Deferred maintenance means Oracle simply notes which object_ids no longer exist and when reading through a global index ignores index entries where the rowid includes the object_id of a dropped object.

Note: the rowid stored in a B-tree index for a global index is made up of 4 components that require 10 bytes of storage. In order these are: (object_id of table partition, file_id, block_id, row number within block). For a local index or index on a simple heap table the object_id of the table can be inferred from the identity of the index so it is not stored and the rowid needs only 6 bytes of storage.

So the benefit of deferred maintenance is that dropping a partition takes virtually no time at all, but (a) Oracle has to clean up the garbage at some point and (b) until the garbage has been cleaned up it has to be read before it can be ignored.

A thought about the second point – if Oracle can check for dropped object_ids very efficiently then it doesn’t necessarily matter that you haven’t cleaned out the garbage. You application won’t instantly run more slowly, it’s just that you won’t have achieved the (possible) benefit of a smaller index that might allow the application to run a little faster.

So here’s a hypothesis to explain why the OP’s previous experience of deferred maintenance was very slow : if you update global indexes in real time Oracle does that job as efficiently as possible because it can use key values and rowids from the table segment that it’s dropping to create a “delete array” for the index; if you defer the maintenance Oracle has to walk through the index one entry at a time and work out whether or not to delete it – and we all know that single-row processing is more expensive than array-processing. It’s worth noting that there are notes on MOS to support this hypothesis, e.g. Bug 27468233 : ALTER INDEX COALESCE CLEANUP IS GENERATING HUGE AMOUNT OF REDO reports an example of generating 23GB of redo while cleaning up an index of only 1.8GB. (Version

So let’s build a model and do some simple tests.

The Model

I’m going to build a table with 6 (range) partitions and two global indexes. I’ll set up two very different patterns of data for the two indexes to see how much impact the data pattern might have.

I’ll drop the bottom three partitions, then clean up the mess in a variety of ways. There’s the call to the pmo_deferred_gidx_main_job, which normally runs at 2:00 am daily but can be initiated by a call to dbms_scheduler.run_job; then there’s the dbms_part.cleanup_gidx() procedure that has a couple of options; then there’s a simple call to “alter index coalesce cleanup [only][parallal N]” (which needs improved documentation) and finally, of course, “alter index rebuild [online]”.

For at least some runs of the tests it will be worth enabling SQL_trace to see what happens behind the scenes; and it’s always worth checking the session activity stats – and maybe some activity from some other dynamic performance views as well.

So here’s the code to create the data set:

rem     Script:         12c_global_index_maintenance_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2022
rem     Last tested:

create table t1 
partition by range(id) (
        partition p09000 values less than ( 9000),
        partition p18000 values less than (18000),
        partition p27000 values less than (27000),
        partition p36000 values less than (36000),
        partition p45000 values less than (45000),
        partition p54000 values less than (54000)
        rownum - 1                      id,
        trunc(dbms_random.value(0,600)) n1,
        rpad('x',100)                   padding
        all_objects ao
        rownum <= 54000

create index t1_n1 on t1(n1) pctfree 0;
create index t1_id on t1(id) pctfree 0;

        index_name, num_rows, s.blocks, leaf_blocks, status, orphaned_entries 
        user_indexes i, user_segments s 
        i.index_name = s.segment_name 
and     i.table_name='T1' 
and     partitioned = 'NO'

alter table t1 drop partition p09000, p18000, p27000 update global indexes; 

                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'

        index_name, num_rows, s.blocks, leaf_blocks, status, orphaned_entries 
        user_indexes i, user_segments s 
        i.index_name = s.segment_name 
and     i.table_name='T1' 
and     partitioned = 'NO'

For testing purposes I’ve set the index pctfree to 0; and I’ve reported some of the index stats before and after dropping the three partitions so that we can see what the optimizer thinks the indexes look like:

Index size information before drop
-------------------- ---------- ---------- ----------- -------- ---
T1_ID                     54000        256         134 VALID    NO
T1_N1                     54000        256         128 VALID    NO

Index size information after drop
-------------------- ---------- ---------- ----------- -------- ---
T1_ID                     27000        256          68 VALID    YES
T1_N1                     27000        256         128 VALID    YES

Both indexes are valid (which is good for the application) and their segment sizing has not changed. The number of rows has halved in both indexes but the number of (populated) leaf blocks has remained unchanged one index even though it has halved for the other.

If you dumped a few index leaf blocks the explanation for the changes (and the difference in the changes) would become clear. The number of (non-deleted) index entries in the two indexes has not changed, but Oracle is (almost literally) ignoring half of them – the ones that include the object_ids for the original first three table partitions.

The t1_id index is on the (sequential) id and the table is partitioned by id, and we have dropped the partitions that hold (nothing but) the ids less than 27,000 – in earlier versions of Oracle this would have immediately deleted all the index entries from the first half of the index, leaving all the leaf blocks in the 2nd half of the index full – although the index entries are still in those blocks Oracle is behaving as if they don’t exist, which means it treats the blocks as empty when calculating leaf_blocks statistic. The t1_n1 index is on integer values from 0 to 599, randomly distributed across the full range of ids, so by dropping the partitions for ids less than 27,000 we (ought to) have deleted the first half of the index entries for n1 = 0, the first half for n1 = 1 and so on – leaving every index leaf block approximately half empty and available for inclusion in the leaf_blocks count.

How, then, does Oracle manage to “ignore” the rows that would have been deleted in older versions. We can always enable SQL tracing when gathering stats, run tkprof against the trace file, and look for the SQL that Oracle used – and if that doesn’t reveal all, use the sql_id of the relevant statements to pull their plans from memory. Here’s the query (reformatted) and plan for one of the index stats gathering queries that I pulled from memory after finding it and its sql_id in the tkprof output:

SQL> select * from table(dbms_xplan.display_cursor('gtnd3aphdkp3k'));

SQL_ID  gtnd3aphdkp3k, child number 0
select /*+ 
                opt_param('_optimizer_use_auto_indexes' 'on')
                no_parallel_index(t, "T1_ID")  
                dbms_stats  cursor_sharing_exact  use_weak_name_resl 
                dynamic_sampling(0)  no_monitoring  xmlindex_sel_idx_tbl 
                opt_param('optimizer_inmemory_aware' 'false')
                no_substrb_pad  no_expand index(t, "T1_ID") 
        count(*)                                          as nrw,
        count(distinct sys_op_lbid(130418, 'L', t.rowid)) as nlb,
        null                                              as ndk,
        sys_op_countchg(substrb(t.rowid, 1, 15), 1)       as clf 
        "TEST_USER"."T1" t where "ID" is not null

Plan hash value: 4265068335

| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT |       |       |       |   136 (100)|          |
|   1 |  SORT GROUP BY   |       |     1 |    17 |            |          |
|*  2 |   INDEX FULL SCAN| T1_ID | 27000 |   448K|   136   (1)| 00:00:01 |

Predicate Information (identified by operation id):
   2 - filter((TBL$OR$IDX$PART$NUM(<?>,0,8,0,"T".ROWID)=1 AND "ID" IS NOT NULL))

If you’ve read Cost Based Oracle – Fundamentals you’ll recognise the SQL is typical of the pattern Oracle uses to gather stats on an index, with a couple of sys_op() function calls that dissect rowids to allow Oracle to calculate the number of leaf_blocks in, and clustering_factor of, the index. What’s new, though is the filter() predicate on the index scan that (presumably) is checking that the rowid belongs to a table partition that still exists. (In other circumstances the “<?>” would be the table-name, possibly it’s a printing error here. The value for 8 as the third parameter also appears in queries involving table-expansion with partial indexing.)

Unsurprisingly, if you execute a simple query driven through one of the indexes after dropping partitions you’ll see exactly the same filter() predicate generated for the execution plan for the range scan operation.

SQL_ID  822pfkz83jzhz, child number 0
select  /*+ index(t1(n1)) */  id from t1 where n1 = 300

Plan hash value: 2152633691

| Id  | Operation                                  | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
|   0 | SELECT STATEMENT                           |       |      1 |        |    44 (100)|     40 |00:00:00.01 |      41 |
|   1 |  TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| T1    |      1 |     45 |    44   (0)|     40 |00:00:00.01 |      41 |
|*  2 |   INDEX RANGE SCAN                         | T1_N1 |      1 |     45 |     1   (0)|     40 |00:00:00.01 |       3 |

Predicate Information (identified by operation id):

   2 - access("N1"=300)

The tbl$or$idx$part$num() function is an important iten to consider at this point. How much impact will it have on your processing – it’s hard to give a generic answer since it may depend on exactly what your data looks like and whether or not the function can cache its result effectively. It’s also possible that the performance of the function is related to either the number of partitions dropped or the number of partitions still in existence – so that’s a detail that probably has to be tested at the correct scale before you go into production

More significantly, perhaps, is how long that impact is going to be relevant and what savings it has to be balanced against. My thought at this point is that if you drop a partition but don’t clean up the index you reduce your workload by not visiting (possibly cached) table blocks, but you pay for the benefit by calling the function for every index entry you visit (whether or not it would have required you to visit the table prior to dropping the partition). Maybe it’s okay to leave the index uncleaned for a few days or even a few weeks before you take any steps to clean up the mess; if that’s the case then maybe you can spread a relatively large number of clean-up jobs over a long enough period of time that their impact doesn’t become visible to the users.

Tests and Results Test number 1:

What does pmo_deferred_gidx_main_job do?

I had to connect as a dba and grant alter on this job to my normal test user to be able to do the following test. The default role for my test user had also been granted create job, so that might have been needed as well; and you’ll see that I’ve included a number of my typical “v$ snapshot” procedures to measure different aspects of the workload, and I’ve enabled the 10046 trace to see what Oracle does behind the scenes.

alter system flush buffer_cache;

execute snap_enqueues.start_snap
execute snap_events.start_snap
execute snap_my_stats.start_snap
execute snap_redo.start_snap
execute snap_rollstats.start_snap

alter session set events '10046 trace name context forever, level 8';

execute dbms_scheduler.run_job('SYS.PMO_DEFERRED_GIDX_MAINT_JOB',true)

alter session set events '10046 trace name context off';

execute snap_rollstats.end_snap
execute snap_redo.end_snap
execute snap_my_stats.end_snap
execute snap_events.end_snap
execute snap_enqueues.end_snap

The most significant discovery in this test was that the package (ultimately) executed two SQL statements:


So really all I needed from that point onwards was to worry about testing and/or optimising variations of the “alter index coalesce” command.

Test number 2:

What does the procedure dbms_part.cleanup_gidx() do?

This procedure takes a schema name and table name as its first inputs with defaults null and has two other parameters called parallel (default 0) and options (default ‘CLEANUP_ORPHANS’); the only other value you can supply for options is ‘COALESCE’. Again the 10046 trace of my wrapper was very useful, as this showed the following SQL when I specified just the schema and table name for the call:


The 19c manual does mention the “ONLY” keyword, but doesn’t explain its significance (but I will in a moment). If I re-ran the test with options set to ‘COALESCE’ the SQL statements changed to:


This did more work than the “cleanup only” run (figures in a moment). When I re-ran the tests setting the parallel option to 2 the same SQL statement appeared with “PARALLEL 2”.

So here are some of the most important numbers for the calls to dbms_part.cleanup_gidx(). They are the headline undo and redo figures from the session statistics:

Default: (cleanup_orphans)
Name                                       Value
----                                       -----
redo entries                              54,283
redo size                             12,412,092
undo change vector size                4,762,432

Name                                       Value
----                                       -----
redo entries                              57,427
redo size                             15,518,660
undo change vector size                6,466,264

As you consider those figures, let me remind you that the indexes started out at roughly 1MB each, and we dropped 27,000 rows in three partitions. A quick check of the “cleanup orphans” arithmetic and you can see (approx) 54,000 redo entries = (two indexes * 27,000 rows each), and 12.4M redo size / 54K redo entries = 230 bytes per index entry. I’ve highlighted that last result because that’s a number you can use as a baseline to estimate the redo that will be generated by cleaning up global indexes. How many rows will be dropped from the table, how many global indexes have you got on the table – multiply the two together and multiply by 230.

Of course, that’s just for “cleanup only” and it assumes that every row appears in every index (which, in a well engineered system, won’t be true). Where does the extra 3MB of redo come from? Let’s drop down one more level in the processing, and run explicit “alter index” statements through the test harness.

Test number 3:

What does alter index xxx coalesce cleanup [only] do?

Here are the redo and undo summaries fron two sets of tests – “coalesce cleanup only”, and “coalesce cleanup”

Index t1_n1                         Cleanup only            Cleanup
-----------                         -------------------------------
redo entries                              27,139             28,176
redo size                              6,209,140          8,930,900
undo change vector size                2,382,564          3,999,900

Index t1_id                         Cleanup only            Cleanup
-----------                         -------------------------------
redo entries                              27,144             28,966
redo size                              6,202,436          6,547,460
undo change vector size                2,379,868          2,461,908

For “coalesce cleanup only” the workload for the two indexes is (effectively) identical – it’s basically the undo and redo from marking 27,000 index entries as deleted and doing nothing else. The blocks have note been cleaned up in any way, that will be left to future sessions which need to insert and entry in a block and find that it is full, but has lots of space that can be reclaimed from deleted entries.

When we use the “coalesce cleanup” (i.e. without “only”) Oracle does some extra work, but the amount of work varies significantly depending on the nature of the index: t1_n1 generates an extra 2.7MB of redo, index t1_id generates only another 345KB. That may be a little surprising, but we’ve already had a clue that something like this might happen, and since every other strategy for “cleaning” the indexes comes down to running these variations of the coalesce command we should look a little further into what they do and how they work.

To get a complete picture we’ll have to do some work with the dbms_space package, the index treedump command, dumping redo, and we also ought to take a look at v$rollstat and v$enqueue_stat. and we’ll pursue that in the Deep Dive section.

Test number 4:

What does alter index rebuild online do?

There’s a very important point to check in this test – if your database is noarchivelog mode the rebuild will be nologging. and you’ll be fooled by the apparent efficiency of the mechanism right up to the point where you go to production and find that you’re generating a huge amount of redo. For the record, my indexes were roughly 64 blocks (512KB) each when rebuilt and produced the following redo figures (and virtually no undo):

Index t1_n1
redo entries                                 343
redo size                                594,984
redo size for direct writes              527,148
undo change vector size                   18,528
sorts (rows)                              27,058

Index t1_id
redo entries                                 345
redo size                                625,956
redo size for direct writes              560,092
undo change vector size                   18,096
sorts (rows)                              27,022

I’ve included the sorts statistic as a reminder that there are other potentially nasty overheads to consider. And when you do an online rebuild Oracle will have to lock the table briefly create a journal table (effectively a materialized view log) to capture the changes that go on while the rebuild is running then apply them when the rebuild is nearly complete, and the rebuild has to be based on a tablescan and sort.

Depending on what fraction of the partitions you are dropping, though, this does look like a very promising option – especially when you have to cater for the problem of shifting the redo to a remote site.

Deep Dive

We have seen how much redo was generated for both “coalesce cleanup” and “coalesce cleanup only” and have an idea that we know what’s happening, so we will be taking a look at some redo dumps to see if that confirms our suspicion. Before we get to that extreme, though, it’s worth taking a couple of simpler steps.

The dbms_space package

We can use the dbms_space.space_usage() procedure to see the state of blocks in the two index segments before and after the attempts to cleanup. It’s important to remember that for index segments the procedure uses the “FS2” state to report blocks that are “free”, i.e. formatted but contain no data, aren’t in the index structure and can be linked in to the index when an existing block needs to be split.

Here are three sets of results from calls to the procedure showing the state immediately before (and after) the partitions were dropped, then after a “coalesce cleanup only” test, and then after a “coalesce cleanup” test.

Index T1_N1                   Blocks       Bytes    |  After "Cleanup Only" |   After "Cleanup" 
Unformatted                   :    0 /         0    |      0 /         0    |      0 /         0
Freespace 1 (  0 -  25% free) :    0 /         0    |      0 /         0    |      0 /         0
Freespace 2 ( 25 -  50% free) :    1 /     8,192    |      1 /     8,192    |     67 /   548,864
Freespace 3 ( 50 -  75% free) :    0 /         0    |      0 /         0    |      0 /         0
Freespace 4 ( 75 - 100% free) :    0 /         0    |      0 /         0    |      0 /         0
Full                          :  134 / 1,097,728    |    134 / 1,097,728    |     68 /   557,056

Index T1_ID                   Blocks       Bytes    |  After "Cleanup Only" |  After "Cleanup"
Unformatted                   :    0 /         0    |      0 /         0    |      0 /         0
Freespace 1 (  0 -  25% free) :    0 /         0    |      0 /         0    |      0 /         0
Freespace 2 ( 25 -  50% free) :    1 /     8,192    |      1 /     8,192    |     65 /   532,480
Freespace 3 ( 50 -  75% free) :    0 /         0    |      0 /         0    |      0 /         0
Freespace 4 ( 75 - 100% free) :    0 /         0    |      0 /         0    |      0 /         0
Full                          :  128 / 1,048,576    |    128 / 1,048,576    |     64 /   524,288

There are two key points in these figures:

  • cleanup only doesn’t change the state of any space usage information.
  • cleanup doesn’t do anything to release space back to the tablespace; it simply compacts the data into a smaller number of blocks and tags empty blocks as “free” (and takes them out of the index structure, but that’s not visible in summary from dbms_space.space_usage).
Index treedumps

If we want to find out how many of the FS2 blocks are in the index structure and how many have been taken out of the tree then we’ll have to do a tree dump – get the object_id of the index and issue:

alter session set events 'immediate trace name treedump level {object id}';

Here are the first few lines of the tree dump for the t1_n1 index in the same three states, created, after cleanup only, after cleanup:

Immediately after drop
branch: 0x9000684 150996612 (0: nrow: 128, level: 1)
   leaf: 0x9000685 150996613 (-1: row:449.449 avs:8)
   leaf: 0x9000686 150996614 (0: row:444.444 avs:4)
   leaf: 0x9000687 150996615 (1: row:444.444 avs:4)
   leaf: 0x9000688 150996616 (2: row:444.444 avs:4)
   leaf: 0x9000689 150996617 (3: row:444.444 avs:4)
   leaf: 0x900068a 150996618 (4: row:444.444 avs:4)
   leaf: 0x900068b 150996619 (5: row:444.444 avs:4)

After "Cleanup only"
branch: 0x9000684 150996612 (0: nrow: 128, level: 1)
   leaf: 0x9000685 150996613 (-1: row:449.214 avs:8)
   leaf: 0x9000686 150996614 (0: row:444.216 avs:4)
   leaf: 0x9000687 150996615 (1: row:444.216 avs:4)
   leaf: 0x9000688 150996616 (2: row:444.226 avs:4)
   leaf: 0x9000689 150996617 (3: row:444.216 avs:4)
   leaf: 0x900068a 150996618 (4: row:444.206 avs:4)
   leaf: 0x900068b 150996619 (5: row:444.231 avs:4)
   leaf: 0x900068c 150996620 (6: row:444.201 avs:4)
After "Cleanup"
branch: 0x9000684 150996612 (0: nrow: 64, level: 1)
   leaf: 0x9000685 150996613 (-1: row:446.446 avs:16)
   leaf: 0x9000688 150996616 (0: row:444.444 avs:4)
   leaf: 0x900068a 150996618 (1: row:444.444 avs:4)
   leaf: 0x900068c 150996620 (2: row:444.444 avs:4)
   leaf: 0x900068e 150996622 (3: row:444.444 avs:4)
   leaf: 0x9000690 150996624 (4: row:444.444 avs:4)
   leaf: 0x9000692 150996626 (5: row:444.444 avs:4)
   leaf: 0x9000694 150996628 (6: row:444.444 avs:4)
   leaf: 0x9000696 150996630 (7: row:444.444 avs:4)
   leaf: 0x9000698 150996632 (8: row:444.444 avs:4)
   leaf: 0x900069a 150996634 (9: row:421.421 avs:12)

This is the index where the values 0 to 599 have been spread randomly across the 54,000 different id values, so each n1 value appears in roughly 90 rows – of which we’ve deleted about 45 by dropping the bottom 3 partitions of 6. For leaf blocks “row:X,Y” means there are X rows in the block directory of which Y would remain after rows marked as deleted had been removed, “avs:N” shows N bytes of free space in the block. The t1_n1 index can fit 444 index entries per leaf block and we can see that immediately after the “drop partition” none of them has been marked as deleted. After a “cleanup only” however we can see that (as expected) roughly half the rows in every leaf block have been marked as deleted with half remaining. After a “cleanup” we can see that we’re back to 444 rows per leaf block with no deletions and virtually no freespace.,

Notice, however, the way that the leaf block addresses have changed during the cleanup. If we examine just the last 3 digits of the decimal version of the leaf block addresses we start with:

613, 614, 615, 616, 617, 618, 618, 619, 620

but we end with:

613, 616, 618, 620 ..

Effectively, Oracle has “copied back” index entries from block 614 to block 613, copied the remaining rows forward from 614 and all the rows from 615 to 616, copied the “overflow” rows from 616 and all the rows from 617 into 618 and so on, detaching empty blocks from the index structure as they appear. (It’s not likely that Oracle thinks in terms of “copying forward/back”, it’s more likely that Oracle simply reads through the index in order constructing new leaf blocks in private memory and has a simple algorithm for deciding which block to replace e.g. “the most recent leaf block that does not have any overflow to use up”)

If we now examine the tree dumps from index t1_id we can see why the volume of redo differs on the final “cleanup” phase of the code.

Immediately after drop
branch: 0x9000784 150996868 (0: nrow: 134, level: 1)
   leaf: 0x9000785 150996869 (-1: row:426.426 avs:7)
   leaf: 0x9000786 150996870 (0: row:421.421 avs:1)
   leaf: 0x9000787 150996871 (1: row:421.421 avs:1)
   leaf: 0x9000788 150996872 (2: row:421.421 avs:1)
   leaf: 0x9000789 150996873 (3: row:421.421 avs:2)
   leaf: 0x900078a 150996874 (4: row:421.421 avs:1)
   leaf: 0x900078b 150996875 (5: row:421.421 avs:1)
   leaf: 0x90007c6 150996934 (64: row:400.400 avs:0)
   leaf: 0x90007c7 150996935 (65: row:400.400 avs:0)
   leaf: 0x90007c8 150996936 (66: row:400.400 avs:0)
   leaf: 0x90007c9 150996937 (67: row:400.400 avs:0)
   leaf: 0x90007ca 150996938 (68: row:400.400 avs:0)
   leaf: 0x90007cb 150996939 (69: row:400.400 avs:0)

After "Cleanup only"
branch: 0x9000784 150996868 (0: nrow: 134, level: 1)
   leaf: 0x9000785 150996869 (-1: row:426.0 avs:7)
   leaf: 0x9000786 150996870 (0: row:421.0 avs:1)
   leaf: 0x9000787 150996871 (1: row:421.0 avs:1)
   leaf: 0x9000788 150996872 (2: row:421.0 avs:1)
   leaf: 0x9000789 150996873 (3: row:421.0 avs:2)
   leaf: 0x900078a 150996874 (4: row:421.0 avs:1)
   leaf: 0x900078b 150996875 (5: row:421.0 avs:1)
   leaf: 0x90007c6 150996934 (64: row:400.0 avs:0)
   leaf: 0x90007c7 150996935 (65: row:400.303 avs:0)
   leaf: 0x90007c8 150996936 (66: row:400.400 avs:0)
   leaf: 0x90007c9 150996937 (67: row:400.400 avs:0)

After "Cleanup"
branch: 0x9000784 150996868 (0: nrow: 68, level: 1)
   leaf: 0x90007c7 150996935 (-1: row:303.303 avs:1940)
   leaf: 0x90007c8 150996936 (0: row:400.400 avs:0)
   leaf: 0x90007c9 150996937 (1: row:400.400 avs:0)
   leaf: 0x90007ca 150996938 (2: row:400.400 avs:0)
   leaf: 0x90007cb 150996939 (3: row:400.400 avs:0)
   leaf: 0x90007cc 150996940 (4: row:400.400 avs:0)
   leaf: 0x90007cd 150996941 (5: row:400.400 avs:0)

I’ve shown two sections of the treedump for the first two extracts, the start of the index and the start of the “2nd half” of the index where the id values are in the partitions that we kept. You’ll notice that the number of index entries per leaf block drops from 426 to 400 as we move through the index, that’s just the effect of a sequential id generally getting bigger (42 takes 2 bytes, 42,000 takes 3 bytes, 42,042 takes 4 bytes).

After “cleanup only” all the leaf blocks in the first half the the index show “no rows remaining of 4xx”, while all the leaf blocks in the 2nd half the index show “400 rows remaining of 400” – with the exception of the leaf block numbered 65 at address 0x90007c7, which shows 303 rows remaing of 400, that must be the block that held the highest few rows from partition p27000.

After the final cleanup we can see that this “mid-point” leaf block has become the “low value” leaf block, and the rest of the index leaf blocks look as if they are completely unchanged. (I think we can assume that the “copy forward/back” code caters for a few boundary conditions that (e.g.)stop Oracle from doing something silly with leaf blocks that are already completely full.)

In this case, then, we can guess that Oracle has simply removed 66 leaf blocks (blocks -1 to 64) from the index structure and reconnected block number 65 as the starting block. In the previous case the “cleanup” redo disconnected a simlar number of leaf blocks, but also rewrote all the blocks that were kept.

Dumping Redo

The information we examined so far prompts us to ask two significant questions:

  • Moving from the “cleanup only” to the “cleanup” increased the redo by 345KB in the case of the sequential t1_id index, but by 2.7MB in the case of the “random arrival” t1_n1 index. In both cases the number of leaf blocks to be relinked is very similar, so the difference seems to be due to the work done in compacting roughly 128 leaf blocks down to 64 leaf blocks – how does that generate (2700KB – 345KB)/64 = 36KB per “new” block? We might also wonder why “just” relinking 66 blocks appears to take 345K/66 = 5K per block anyway – that seems a little surprising.
  • For the full cleanup operation does Oracle delete the entries from a few consecutive blocks, compact them and write them and then carry on with the next few blocks; or does it walk the entire index deleting entries and then walk the index again to tidy up and compact (logically) adjacent blocks. If the latter then for a very large index (rather than my tiny test) that means we could see two consecutive full scans physically reading the whole index, and possibly generating more redo thanks to delayed block cleanout.

If we dump the redo log we can answer these questions by extracting some fairly simple details – although the resulting trace file is going to be rather large. For example we could code:

column current_scn new_value start_scn
select to_char(current_scn,'9999999999999999') current_scn from v$database;

alter index t1_n1 coalesce cleanup;

column current_scn new_value end_scn
select to_char(current_scn,'9999999999999999') current_scn from v$database;

alter session set tracefile_identifier='n1_index';
alter system dump redo scn min &start_scn scn max &end_scn /* layer 10 */;
alter session set tracefile_identiier='';

I’ve included a commented out “layer 10” in the dump command to show how you can be selective in what redo you dump. Layer 10 is the set of redo op codes for index-related change vectors. You will find other op codes (in particular 5.1) being dumped as well because Oracle dumps the whole of any redo record containing a change vector of the specified type.

When I dumped the redo for a single index cleanup the trace file was about 45MB – so not something you would want to read in detail – but you could start with a few simple searches, for example:

grep -n "OP:10"       or19_ora_13388_n1_index.trc >temp_n1_op10.txt
grep -n "REDO RECORD" or19_ora_13388_n1_index.trc >temp_n1_record.txt

grep "OP:"  or19_ora_13388_n1_index.trc | sed "s/^.*OP:/OP:/" | sed "s/ .*$//" | sort | uniq -c | sort -n

The first grep example simply extracts all the index-related op-codes (with line numbers from the trace file to make it easier to spot patterns. The second grep does the same for the start of each redo record because those lines also report the length of each record, which may make it possible to find out more about the surprising amount of redo generated by the compacting and relinking. The third example is just a bit of showing off, it extracts all the op-code lines, cuts the actual OP:nn.nn bit out, then sorts and counts the number of appearances of each op-code. Here’s the result of that last example from one of the tests:

      1 OP:13.24
      1 OP:17.28
      1 OP:24.1
      2 OP:11.5
      3 OP:10.11
      3 OP:11.3
     15 OP:10.14
     18 OP:14.4
     18 OP:22.5
     28 OP:10.7
     29 OP:5.11
     36 OP:22.2
     64 OP:13.22
     68 OP:10.34
     82 OP:10.39
    138 OP:10.8
    205 OP:5.4
    219 OP:10.6
    259 OP:4.1
    324 OP:10.5
    347 OP:5.6
    689 OP:5.2
  27327 OP:10.4
  27833 OP:5.1

It’s an obvious guess that the 27,000 OP:10.4 at the bottom are “delete index entry” and most of the OP:5.1 are their corresponding undo change vectors. The OP:10.4 count is a little high, but checking the session activity stats I found that they showed: rollback changes – undo records applied = 376, so some internal transaction rollback took place, and the session stats are a reasonable match for the “excess” OP:10.4, suggesting that at some point we saw a batch of deletes rollback and restart. (NOTE: to investigate further, this suggests that the entire operation is executing as a number of relatively small transactions and could safely be interrupted – a hypothesis supported by the 205 OP:5.4 (commit/rollback change vectors)). The “excess” OP:10.4 are also closely matched by the OP:10.5 / OP:5.6 figures (restore leaf during rollback / mark user undo record applied)

Before chasing any other details let’s answer the question about whether the compacting takes place during the delete phase or starts after the deletes are all done. If we jump to the end of the “OP:10” file we can check to see if there are OP:10.4 all the way through the file with small patches of other stuff scattered through, or it’s non-stop OP:10.4 with a load of “something else” appearing only in the last couple of thousand lines. The answer is that we get regular repeates of a pattern like the following:

OP:10.4 x ~400

For example (after lots of 10.4 on blocks 0x9000695 and 0x9000696. we see:

119184:CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000696 OBJ:131045 SCN:0x0000000002165409 SEQ:1 OP:10.6 ENC:0 RBL:0 FLG:0x0000
119210:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000695 OBJ:131045 SCN:0x0000000002165407 SEQ:1 OP:10.6 ENC:0 RBL:0 FLG:0x0000
119236:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000694 OBJ:131045 SCN:0x0000000002165405 SEQ:1 OP:10.6 ENC:0 RBL:0 FLG:0x0000
119272:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000684 OBJ:131045 SCN:0x0000000002165409 SEQ:1 OP:10.39 ENC:0 RBL:0 FLG:0x0000
119625:CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000694 OBJ:131045 SCN:0x0000000002165409 SEQ:1 OP:10.8 ENC:0 RBL:0 FLG:0x0000
120499:CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000695 OBJ:131045 SCN:0x0000000002165409 SEQ:1 OP:10.34 ENC:0 RBL:0 FLG:0x0000
120862:CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000696 OBJ:131045 SCN:0x0000000002165409 SEQ:2 OP:10.8 ENC:0 RBL:0 FLG:0x0000

This translates to:

10.6:  lock block 0x09000696
10.6:  lock block 0x09000695
10.6:  lock block 0x09000694
10.39: update branch block 0x09000684
10.8:  new block 0x09000694
10.34: empty block 0x09000695
10.8:  new block 0x09000696

The pattern then repeats starting with deletes from block 0x09000697 and 0x09000698, and then producing an interesting detail:

DBA:0x09000698 OBJ:131045 SCN:0x000000000216540f SEQ:1 OP:10.6 
DBA:0x09000697 OBJ:131045 SCN:0x000000000216540d SEQ:1 OP:10.6 
DBA:0x09000696 OBJ:131045 SCN:0x000000000216540b SEQ:1 OP:10.6 
DBA:0x09000684 OBJ:131045 SCN:0x000000000216540f SEQ:1 OP:10.39 
DBA:0x09000696 OBJ:131045 SCN:0x000000000216540f SEQ:1 OP:10.8 
DBA:0x09000697 OBJ:131045 SCN:0x000000000216540f SEQ:1 OP:10.34 
DBA:0x09000698 OBJ:131045 SCN:0x000000000216540f SEQ:2 OP:10.8 

Check line 5 above – it’s another OP:10.8 creating another version of block 0x09000696. And that’s one of the reasons why the compaction creates more redo than expected. Oracle may recreate a block several times before the block gets to its final compacted state.

Looking at the detail of the trace file, in particular the pattern of deletes followed by “new block”, it looks as if Oracle deletes all the rows from just two adjacent blocks (perhaps to minimise block-level locking) and then does the best it can with the rows that are left and this may mean (as it does in our fragment) writing one full block and one partial block, though for a sparsely populated index it might mean writing just a single partial block.

To show the total effect on redo generation I’ve extracted the redo records for a complete cycle of the pattern (excluding several hundred deletes, which generate 228 bytes each). I’ve used egrep to pick out 3 patterns “OP:”, “REDO RECORD” and (using hindsight) “new block has”:

132650:REDO RECORD - Thread:1 RBA: 0x0002c8.00046e90.0134 LEN: 0x0058 VLD: 0x01 CON_UID: 3792595
132652:CHANGE #1 CON_ID:3 TYP:2 CLS:1 AFN:36 DBA:0x09000698 OBJ:131045 SCN:0x000000000216540e SEQ:1 OP:4.1 ENC:0 RBL:0 FLG:0x0000

132656:REDO RECORD - Thread:1 RBA: 0x0002c8.00046e90.018c LEN: 0x0164 VLD: 0x01 CON_UID: 3792595
132658:CHANGE #1 CON_ID:3 TYP:0 CLS:17 AFN:17 DBA:0x044001c0 OBJ:4294967295 SCN:0x00000000021653f8 SEQ:1 OP:5.2 ENC:0 RBL:0 FLG:0x0000
132661:CHANGE #2 CON_ID:3 TYP:1 CLS:18 AFN:17 DBA:0x04406fdf OBJ:4294967295 SCN:0x000000000216540f SEQ:1 OP:5.1 ENC:0 RBL:0 FLG:0x0000
132682:CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000698 OBJ:131045 SCN:0x000000000216540f SEQ:1 OP:10.6 ENC:0 RBL:0 FLG:0x0000

132691:REDO RECORD - Thread:1 RBA: 0x0002c8.00046e91.0100 LEN: 0x00e4 VLD: 0x01 CON_UID: 3792595
132693:CHANGE #1 CON_ID:3 TYP:0 CLS:18 AFN:17 DBA:0x04406fdf OBJ:4294967295 SCN:0x000000000216540f SEQ:2 OP:5.1 ENC:0 RBL:0 FLG:0x0000
132708:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000697 OBJ:131045 SCN:0x000000000216540d SEQ:1 OP:10.6 ENC:0 RBL:0 FLG:0x0000

132717:REDO RECORD - Thread:1 RBA: 0x0002c8.00046e91.01e4 LEN: 0x00e4 VLD: 0x01 CON_UID: 3792595
132719:CHANGE #1 CON_ID:3 TYP:0 CLS:18 AFN:17 DBA:0x04406fdf OBJ:4294967295 SCN:0x000000000216540f SEQ:3 OP:5.1 ENC:0 RBL:0 FLG:0x0000
132734:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000696 OBJ:131045 SCN:0x000000000216540b SEQ:1 OP:10.6 ENC:0 RBL:0 FLG:0x0000

132743:REDO RECORD - Thread:1 RBA: 0x0002c8.00046e92.00d8 LEN: 0x0058 VLD: 0x01 CON_UID: 3792595
132745:CHANGE #1 CON_ID:3 TYP:2 CLS:1 AFN:36 DBA:0x09000684 OBJ:131045 SCN:0x000000000216540a SEQ:1 OP:4.1 ENC:0 RBL:0 FLG:0x0000

132749:REDO RECORD - Thread:1 RBA: 0x0002c8.00046e92.0130 LEN: 0x0128 VLD: 0x01 CON_UID: 3792595
132751:CHANGE #1 CON_ID:3 TYP:0 CLS:18 AFN:17 DBA:0x04406fdf OBJ:4294967295 SCN:0x000000000216540f SEQ:4 OP:5.1 ENC:0 RBL:0 FLG:0x0000
132770:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000684 OBJ:131045 SCN:0x000000000216540f SEQ:1 OP:10.39 ENC:0 RBL:0 FLG:0x0000

132779:REDO RECORD - Thread:1 RBA: 0x0002c8.00046e93.0068 LEN: 0x4038 VLD: 0x01 CON_UID: 3792595
132781:CHANGE #1 CON_ID:3 TYP:0 CLS:17 AFN:17 DBA:0x044001c0 OBJ:4294967295 SCN:0x000000000216540f SEQ:1 OP:5.2 ENC:0 RBL:0 FLG:0x0000
132784:CHANGE #2 CON_ID:3 TYP:1 CLS:18 AFN:17 DBA:0x04406fe0 OBJ:4294967295 SCN:0x000000000216540f SEQ:1 OP:5.1 ENC:0 RBL:0 FLG:0x0000
133123:CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000696 OBJ:131045 SCN:0x000000000216540f SEQ:1 OP:10.8 ENC:0 RBL:0 FLG:0x0000
133139:new block has 444 rows

133653:REDO RECORD - Thread:1 RBA: 0x0002c8.00046eb5.0010 LEN: 0x20e8 VLD: 0x05 CON_UID: 3792595
133656:CHANGE #1 CON_ID:3 TYP:0 CLS:17 AFN:17 DBA:0x044001c0 OBJ:4294967295 SCN:0x000000000216540f SEQ:2 OP:5.2 ENC:0 RBL:0 FLG:0x0000
133659:CHANGE #2 CON_ID:3 TYP:1 CLS:18 AFN:17 DBA:0x04406fe1 OBJ:4294967295 SCN:0x0000000002165410 SEQ:1 OP:5.1 ENC:0 RBL:0 FLG:0x0000
133998:CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000697 OBJ:131045 SCN:0x000000000216540f SEQ:1 OP:10.34 ENC:0 RBL:0 FLG:0x0000

134007:REDO RECORD - Thread:1 RBA: 0x0002c8.00046ec6.0010 LEN: 0x0064 VLD: 0x01 CON_UID: 3792595
134009:CHANGE #1 CON_ID:3 TYP:0 CLS:8 AFN:36 DBA:0x09000680 OBJ:131045 SCN:0x0000000002165409 SEQ:1 OP:13.22 ENC:0 RBL:0 FLG:0x0000

134017:REDO RECORD - Thread:1 RBA: 0x0002c8.00046ec6.0074 LEN: 0x3a74 VLD: 0x01 CON_UID: 3792595
134019:CHANGE #1 CON_ID:3 TYP:0 CLS:17 AFN:17 DBA:0x044001c0 OBJ:4294967295 SCN:0x0000000002165410 SEQ:1 OP:5.2 ENC:0 RBL:0 FLG:0x0000
134022:CHANGE #2 CON_ID:3 TYP:1 CLS:18 AFN:17 DBA:0x04406fe2 OBJ:4294967295 SCN:0x0000000002165410 SEQ:1 OP:5.1 ENC:0 RBL:0 FLG:0x0000
134361:CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000698 OBJ:131045 SCN:0x000000000216540f SEQ:2 OP:10.8 ENC:0 RBL:0 FLG:0x0000
134377:new block has 362 rows

134799:REDO RECORD - Thread:1 RBA: 0x0002c8.00046ee4.00c8 LEN: 0x0058 VLD: 0x01 CON_UID: 3792595
134801:CHANGE #1 CON_ID:3 TYP:0 CLS:17 AFN:17 DBA:0x044001c0 OBJ:4294967295 SCN:0x0000000002165410 SEQ:2 OP:5.4 ENC:0 RBL:0 FLG:0x0000

This shows a complete transaction (the previous redo record is the commit (OP:5.4) for the a few hundred deletes.I’ve highlighted five lines in the output – the three redo record headers that report the most space, and two lines that show you how many rows are in the “newed” blocks gene.rated by the op codes OP:10.8

  • Line 24 – LEN = 0x4038 = 16,440 bytes. That’s basically the size of two blocks: the OP:5.1 is the old image (8KB) of block 0x9000696, the OP:10.8 is the new image (also 8KB) of the block
  • Line 30 – LEN = 0x20e8 = 8,424. That’s basically one block image. The OP:10.34 is “clear block 0x09000698” which requires very little redo, but the OP:5.1 is the old image (8KB) of the block.
  • Line 38 – LEN = 0x3a74 = 14,964 bytes. Again that’s basically the size of two blocks: the OP:5.1 is the old image (8KB) of block 0x9000698, the OP:10.8 is the new image of the block, but it has only 362 rows (of a final 444 rows) in it, so the dump is only about 6.5KB

All the other bits add up to about 1,200 bytes of redo, so in each cycle the compacting activity generates about 40KB of redo in total. Since we end up with 68 filled blocks the total “extra” redo we got as we switched from “coalesce compress only” to “coalesce compress” should be around 68 * 40KB = 2.65 MB, which is pretty close to what we atually saw for the t1_n1 index.

Transactions and Locking

I commented on the presence of the session statistics reporting “rollback changes – undo records applied”, and mentioned the presence of the OP:5.4 records in the redo. These are all pointers to the coalesce command being operated as a series of smaller transactions rather than one large transaction, and even before I had started to dump and examine the redo I had added v$rollstat and v$enqueuestat to my usual snapshot of dynamic performance views. Here are the results from a typical “coalesce cleanup” test for the t1_n1 index.

Rollback stats
USN   Ex Size K  HWM K  Opt K      Writes     Gets  Waits Shr Grow Shr K  Act K
----  -- ------  -----  -----      ------     ----  ----- --- ---- ----- ------
   0   0      0      0      0           0        2      0   0    0     0      0
   1   0      0      0      0      382184       85      0   0    0     0      0
   2   0      0      0      0      412930       91      0   0    0     0     61
   3   0      0      0      0      404260       90      0   0    0     0      0
   4   0      0      0      0      413412       91      0   0    0     0   -106
   5 -13  -1792      0      0      341294       99      0   2    0    24    -71
   6   0      0      0      0      355986       81      0   0    0     0      0
   7   0      0      0      0      444054       97      0   0    0     0      0
   8 -12  -1728      0      0      418474      114      1   2    0    24    -45
   9   0      0      0      0      422776       92      0   0    0     0     38
  10   0      0      0      0      426976       94      0   0    0     0      0

System enqueues
Type    Requests       Waits     Success      Failed    Wait m/s Reason
----    --------       -----     -------      ------    -------- ------
KI             3           0           3           0           0 contention
CR           250          24         250           0          12 block range reuse ckpt
IS            25           0          25           0           0 contention
TM             8           0           8           0           0 contention
TA             2           0           2           0           0 contention
TX           198           0         198           0           0 contention
US             8           0           8           0           0 contention
HW             8           0           8           0           0 contention
TT             4           0           4           0           0 contention
SJ             2           0           2           0           0 Slave Task Cancel
CU             2           0           2           0           0 contention
OD             1           0           1           0           0 Serializing DDLs

System REDO stats
Name                                       Value
----                                       -----
redo entries                              27,977
redo size                              8,853,604
undo change vector size                3,967,100

I have 10 undo segments online, and each has received about 400KB of undo – which comes to a total of 4MB – which is a very good match to the “undo change vector size” reported for the session. (I didn’t get any system generated rollbacks in the run.)

We can also see 198 TX enqueues – which is a very good match for the 196 OP:5.4 that I found in the sessions redo dump on this test.

As I commented in the section on redo generation, to handle the “coalesce cleanup” the session walks the index in leaf block order and (with variations dependent on situations like finding leaf blocks with nothing to delete, or finding leaf blocks that become empty after all the deletes have been done)

  • locks two leaf blocks
  • deletes the dropped rows
  • commits
  • packs the outstanding rows downwards
  • relinks leaf blocks and adjusts branch blocks as necessary
  • commits
    • “cleanup only” locks and deletes from a single row at a time, and doesn’t do the packing.
  • repeats until end of index

(Note – “cleanup only” seems to lock, delete and commit one block at a time)

This strategy has two side-effects. First, though I don’t think it’s documented, it looks as if you could simply kill the process if it started putting too much stress on your system. All that could happen is that the current (small) transaction would be cleaned up, but the rest of the work that had been done up to that moment would persist. I have actually tested this, managing to kill sessions while they were in the middle of a delete a batch (though I’ve never managed to catch an example where a session was in the middle of relinking). After restarting Oracle would simply pick up where it had left off after the previous (small) rollback.

The second side effect is another possible overhead. Partitioned tables tend to be big, and each index clean-up is likely to be a fairly big job generating a lot of redo and undo. How much impact could the undo have on your system? This depends in part on what your undo retention looks like, how long the clean-up takes, and the risk of other sessions running into ORA-01555 errors. In particular there’s an odd problem of undo segment extension causing updates to restart – if an update statement (or delete, or merge with update clause) results in an undo segment extending then the statement will rollback and restart using a different locking strategy.

I don’t know if something of this sort would happen with the deletes from an index coalesce, and the deletes are very small anyway so it might not matter anyway, but the coalesce executes a large number of transactions in a relatively short time period rotating around and gradually filling every available undo segment – which means other large updates are more likely to need to extend an undo segment.

It’s also worth noting that overlapping jobs will sometimes need to do a lot of work to check read (and write) consistency, and to find “upper bound commit” SCNs: if you have a process executing a large number of transactions in a short period of time it can be very expensive for another process to find an “upper bound commit” and you may see it doing a lot of reads of undo segments, reporting a number of “transaction tables consistent read rollbacks” and a large number of “transaction tables consistent reads – undo records applied” in the session stats. (Worst case scenario – the number of transactions could be similar to the number of (used) leaf blocks in the index).

One final point to consider – the report from the (system level) Enqueue stats showed a handful of TM locks, so I enabled lock tracing for the session to see if any of them came from my session and whether they were likely to be a concurrency threat.

alter session set events 'trace[ksq] disk=medium';

Most of the TM enqueues were from my session, but they appeared only after the coalesce was complete, and they were all related to sys-owned (dictionary) tables:

2022-08-05 19:49:27.902*:ksq.c@9175:ksqgtlctx(): *** TM-00000140-00000000-0039DED3-00000000 mode=3 flags=0x401 why=173 timeout=21474836 ***
2022-08-05 19:49:27.902*:ksq.c@9175:ksqgtlctx(): *** TM-00000061-00000000-0039DED3-00000000 mode=3 flags=0x401 why=173 timeout=21474836 ***
2022-08-05 19:49:27.902*:ksq.c@9175:ksqgtlctx(): *** TM-00000049-00000000-0039DED3-00000000 mode=3 flags=0x401 why=173 timeout=21474836 ***
2022-08-05 19:49:27.902*:ksq.c@9175:ksqgtlctx(): *** TM-00000004-00000000-0039DED3-00000000 mode=3 flags=0x401 why=173 timeout=21474836 ***
2022-08-05 19:49:27.903*:ksq.c@9175:ksqgtlctx(): *** TM-0000004B-00000000-0039DED3-00000000 mode=3 flags=0x401 why=173 timeout=21474836 ***
2022-08-05 19:49:27.903*:ksq.c@9175:ksqgtlctx(): *** TM-00000013-00000000-0039DED3-00000000 mode=3 flags=0x401 why=173 timeout=21474836 ***

In case you’re wondering, the first TM lock (object 0x140) is the index_orphaned_entry$ table, and Oracle had to lock it at the end of the “coalesce clean up” to delete the rows that linked the three dropped table partitions with the one global index that I had cleaned.


Deferred global index maintence means you can drop partitions very quickly with virtually no interruption to service and then have Oracle clean up the related index entries at a later point in time. The drawback to deferring the cleanup is that Oracle will use a row-by-row mechanism to mark all the index entries for the dropped data as deleted – at a cost of a full index scan and about 230 bytes of redo per dropped row per index – compared to a bulk-processing mechanism that can be applied if the index entries are dropped as part of the partition drop processing.

There are basically two mechanisms you can use to clean up the entries that are waiting deletion.

  • rebuild the index (online, probably)
  • use one of the special coalesce calls on the index

You can best spread the workload and minimise interference with your normal work by micro-managing the clean up, explicitly issuing whichever “alter index coalesce” or “alter index rebuild” command you prefer for each index in turn that needs to be cleaned up.

None of the coalesce command variants returns space to the tablespace; nor do they even drop the highwater mark on the index segment. If you want to return space to the tablespace you will need to excute a “shrink space” on the index after the coalesce is complete. All the coalesce options generate a very large amount of redo (230 bytes per index entry for each delete, plus a volume that may be several times larger than the actual volume of data in the index to compact the remaining rows into the smallest number of blocks.)

The rebuild (online) will generate a much smaller volume of redo – but the penalties include the problem of journalling and applying the changes that took place as the rebuild was running; plus the cost of scanning the table and sorting the data to produce the index.


My tests are basically the worst case scenario (every leaf block has some entries to be deleted) and very close to best case scenario (leaf blocks will either become completely empty on deletion, or will have no rows deleted).

Do You Worry Your Companies data is Being Stolen?

Pete Finnigan - Fri, 2022-08-05 13:26
The number of data breaches is seemingly growing daily and a lot of companies worry that they could be the next statistic of misery and embarrassment. Do you lose sleep worrying that your company could be breached and its data....[Read More]

Posted by Pete On 05/08/22 At 12:47 PM

Categories: Security Blogs

SQL Model

Tom Kyte - Thu, 2022-08-04 21:26
Tom, sorry about the silly question, but have been trying to do something with SQL Model without any success. I cannot understand how it works, could you do some simple sample? -- may be already exists. Thanks,
Categories: DBA Blogs

KVM with an OnPrem Exadata X9-2 M

Tom Kyte - Thu, 2022-08-04 21:26
I have a client that is trying to implement KVM on an Exadata X9-2 M (, and he was told that in the Exadata he could have a single VM running in a single DB Node in the exadata, not a cluster VM, but just a single VM in a single DB Node. To my understanding, that is not possible in the Exadata as they had to be VM clusters, 2 DB nodes, instead of 1 DB node. I'm I incorrect in what I know?
Categories: DBA Blogs

Shrink space row movement

Tom Kyte - Thu, 2022-08-04 21:26
I use the process below to shrink datafiles and reclaim operating system space. It works fine. My question is there any downside to ALWAYS leaving row movement enabled instead of disabling it when the process is completed? Below is a sample of what I'm currently doing. <code>create table t as select rownum x, lpad('x', 500, 'x') xx from dual connect by level <= 10000; / -- ALTER TABLE t SHRINK SPACE CHECK; select bytes from user_segments where segment_name = 'T'; / delete t where x <= 9900; / select bytes from user_segments where segment_name = 'T'; / alter table t enable row movement; / alter table t shrink space CASCADE; / select bytes from user_segments where segment_name = 'T'; </code>
Categories: DBA Blogs

Outer Join with OR - Ansi vs Oracle

Tom Kyte - Thu, 2022-08-04 21:26
hello I'm curious how you would deal with the following. Table t1 has mixture of usernames, some with format of email address, some with format of first_name.last_name. Table t2 lookup table has username in email format. Requirement to return all 3 rows <code> create table t1 (taskid number, username varchar2(50)) insert into t1 values (1,'john.smith@12]3.com'); insert into t1 values (2,'dave.jones@123.com'); insert into t1 values (3,'dave.smith'); commit; create table t2 (username varchar2(50)); insert into t2 values ('john.smith@123.com'); insert into t2 values ('dave.smith@123.com'); commit; </code> Via ansi syntax, quite forward: <code> select * from t1 left outer join t2 on t1.username = t2.username or t1.username = substr(t2.username,1,instr(t2.username,'@')-1) order by 1 </code> Via oracle syntax - As can't use outer joins with OR, do you have any (easier) alternatives than this? <code> select t1.taskid, t1.username, t2.username from t1, t2 where t1.username = t2.username union all select t1.taskid, t1.username, t2.username from t1, t2 where t1.username = substr(t2.username,1,instr(t2.username,'@')-1) union all select t1.taskid, t1.username, to_char(null) from t1 where not exists (select 1 from t2 where t1.username = t2.username or t1.username = substr(t2.username,1,instr(t2.username,'@')-1)) order by 1 </code> thanks
Categories: DBA Blogs


Marian Crkon - Tue, 2022-08-02 20:52


The Feature - Tue, 2022-08-02 20:52



Categories: APPS Blogs

Cluster Interconnect Split Brain Testing (CSS_CRITICAL yes)

Michael Dinh - Tue, 2022-08-02 20:25

Simply, setting CSS_CRITICAL for the node will provide more weight to not shutdown node during splitbrain.


analytical vs aggregate functions

Tom Kyte - Mon, 2022-08-01 20:06
Are analytical functions replacement for aggregate functions? Some of the developers here are hurrying in replacing the group by aggregate functions with analytical functions just because they learnt something new :-).. Are the analytical functions faster always(compared to aggregate)?
Categories: DBA Blogs

adding sysdate on the file name

Tom Kyte - Mon, 2022-08-01 20:06
Hi Tom When spooling the file on PL/SQL, i would like the file name to have sysdate attached to the filename and it should change dynamicaly.. e.g.PM-INT-20021129_FLASHNOTIFICATIONS.TXT Thanx Zoliswa
Categories: DBA Blogs

FORALL issues...

Tom Kyte - Mon, 2022-08-01 20:06
Hi Tom, I'm using the FORALL statement to update tables in my applications. When the update raises an error, it can be saved using the SAVE EXCEPTIONS clause and then continuing with the update, or the update can be aborted without using that clause. In both these situations however, I am unable to pin down which specific record caused the error, i.e. the id of the record. More specifically, is there a way to store user defined data upon encountering an error during a FORALL execution? Small example: declare type intArray is table of number index by binary_integer; type charArray is table of varchar2 (2000) index by binary_integer; type recType is record (idArray intArray, array2, charArray); l_tabrec recType; lt_id intArray; begin ... ... /* populating the l_tabrec record of arrays here. */ ... begin forall i in 1..l_tabrec.idArray.count update table2 set value = l_tabrec.array2(i) where id = l_tabrec.idArray(i); exception when others then /* here is where I'd like to log the id of the record which fails. */ raise; end; end; I know the SAVE EXCEPTIONS gives us information about the type of error and the iteration on which it encountered the error, but I'm not sure how I can use that information to find which record actually caused the error. Hope my question is clear. Thanks in advance. Kashif
Categories: DBA Blogs

Cluster Interconnect Split Brain Testing

Michael Dinh - Mon, 2022-08-01 18:16

There was a dicusssion about memory causing split brain for a 2 nodes RAC; however, the log shows otherwise.

By no means, am I a RAC expert.

For testing purpose, ethernet for cluster_interconnect was shut down.

From ocssd.trc, it does look to be caused by cluster_interconnect when comparing errors from actual vs test.

Details from Google Doc below.

Cluster Interconnect Split Brain Testing

What do you think?

Documentum – RED Warning on D2 after enabling Tomcat HTTP Security Headers

Yann Neuhaus - Sun, 2022-07-31 04:34

Security is and will always be a very important aspect of IT. Enterprise Content Management (or Document Management Systems or Content Services Platform or whateveryoucallit) is obviously not an exception to that. One of our main goals as a consultant is to add improvements to our customers’ installations for stability, security, and ease of management. However, it happens that not everything goes as planned… In this blog, I will discuss about a small but still annoying issue that appeared on D2 20.2 after enabling the Tomcat HTTP Security Headers at a customer.

D2 provides some capabilities to add some of the standard HTTP Security Headers via its own internal configuration. It already supports the HTTP Strict Transport Security (HSTS) and inside the “settings.properties” file, you can define some parameters like the HSTS max age (hsts.maxage=xxx). In this file, you can also configure D2 for anti-clickjacking (allowed.frame.origins=xxx). However, that’s not all the HTTP Security Headers, and you probably are using other applications besides D2 that are deployed on Tomcat in your company. Therefore, you might start looking at putting the configuration not on the application but instead on the Tomcat layer, to have a common configuration, enterprise wise.

Tomcat HTTP Security Headers (I’m only talking about “httpHeaderSecurity” filter here) doesn’t support everything but it’s still a good base so it might make sense to configure the headers at that layer. What is missing in the Tomcat HTTP Security Headers from the list of “common” Security Headers to define is mainly the Cache-Control, the Content-Security-Policy/CSP (here is a blog related to CSP config in regards to D2 WSCTF, if needed) and the Cross-Origin Resource Sharing/CORS but all of these can be configured through other filters (eithers OOTB filters or custom ones that you need to deploy). By default, the Tomcat HTTP Security Headers are disabled (commented):

[tomcat@d2-0 ~]$ web_xml="$TOMCAT_HOME/conf/web.xml"
[tomcat@d2-0 ~]$
[tomcat@d2-0 ~]$ grep -B2 -A4 'httpHeaderSecurity' ${web_xml}
[tomcat@d2-0 ~]$

To enable it, you just need to uncomment the two sections (remove ‘<!–‘ before the XML tag and ‘–>‘ after) and you can of course start filling it with what you need. Here is a possible configuration:

[tomcat@d2-0 ~]$ cat ${web_xml}




[tomcat@d2-0 ~]$

The above is just an example but that’s how you enable the Tomcat HTTP Security Headers. Now coming to the issue I wanted to talk about in this blog… With these new headers in place, all applications (D2, D2-Config, D2-Smartview, D2-REST, DA, …) appeared to be working fine, no issues were reported by the tests done until someone tried to click on a link received by mail a few days before, and he got this beautiful screen:

Tomcat Red Warning

When that appeared, the following message was also printed on the Tomcat logs:

2022-03-03 13:56:52,342 UTC SEVERE [https-jsse-nio-8080-exec-27] org.apache.catalina.core.StandardWrapperValve.invoke Servlet.service() for servlet [jsp] in context with path [/D2] threw exception [Unable to add HTTP headers since response is already committed on entry to the HTTP header security Filter] with root cause
	javax.servlet.ServletException: Unable to add HTTP headers since response is already committed on entry to the HTTP header security Filter
		at org.apache.catalina.filters.HttpHeaderSecurityFilter.doFilter(HttpHeaderSecurityFilter.java:101)
		at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
		at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
		at com.emc.x3.portal.server.filters.HttpHeaderFilter.doFilter(HttpHeaderFilter.java:87)
		at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
		at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
		at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:61)
		at org.apache.shiro.web.servlet.AdviceFilter.executeChain(AdviceFilter.java:108)
		at com.emc.x3.portal.server.filters.authc.X3SAMLHttpAuthenticationFilter.executeChain(X3SAMLHttpAuthenticationFilter.java:356)
		at org.apache.shiro.web.servlet.AdviceFilter.doFilterInternal(AdviceFilter.java:137)
		at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
		at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:66)
		at org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:449)
		at org.apache.shiro.web.servlet.AbstractShiroFilter$1.call(AbstractShiroFilter.java:365)
		at org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90)
		at org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83)
		at org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:387)
		at org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:362)
		at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
		at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:121)
		at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:133)
		at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
		at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
		at com.emc.x3.portal.server.filters.X3SessionTimeoutFilter.doFilter(X3SessionTimeoutFilter.java:40)
		at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
		at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
		at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:197)
		at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:97)
		at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:540)
		at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:135)
		at org.apache.catalina.valves.StuckThreadDetectionValve.invoke(StuckThreadDetectionValve.java:206)
		at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)
		at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:687)
		at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:78)
		at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:357)
		at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:382)
		at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
		at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:895)
		at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1722)
		at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
		at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191)
		at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)
		at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
		at java.base/java.lang.Thread.run(Thread.java:829)

As you can see, it seems to be linked to what was just configured (“HttpHeaderSecurityFilter” on the first line of the stack trace) and there is also quite a bit of “shiro” references as well as a “X3SAMLHttpAuthenticationFilter“. It would therefore indicate that the issue is most probably linked to the Single Sign-On in relation with the Tomcat HTTP Security Headers. Since Tomcat is unable to add the required HTTP Security Headers, then it fails the request and print the red warning page.

What happened is that the link on the mail contained URL parameters (i.e., the “query” part) so it was something like “https://dns/D2?docbase=xxx&locateId=xxx“. D2 was fully working properly, except when someone tried to directly open a link from a mail without opening D2 first, which shouldn’t be a problem under normal circumstances… When that happened, the user was redirected to the Single Sign-On (AzureAD SAML2, configured on the shiro.ini), and then back to D2, which was supposed to handle the user’s request but failed because of the above Tomcat error. In addition to the issue (and the nice red warning screen), the URL changed to “https://dns/D2” (i.e., the URL parameters were lost in the process).

However, clicking a second time on the link from the mail worked properly and displayed the document correctly (using the “locateId” parameter)! The easiest way to replicate the issue was therefore to clear the cookies and access any URL with parameters. It appeared to me that, the issue would only show up if there is no SAML cookies (SAMLNameID, SAMLSessionIndex). After disabling the SSO (i.e., renaming/removing the “shiro.ini” file and restarting Tomcat), the issue wasn’t reproducible anymore. This confirmed my assumptions that there is a conflict between how D2 handles the SAML response and how Tomcat adds the HTTP Security Headers (“response is already committed“).

With that in mind, I opened the OpenText SR#5056234, to see if it was a D2 bug or something else. After some more investigations, I got the information from OpenText that this is apparently happening because of the open source third party library that is used for SSO (i.e., Apache Shiro). Without Apache Shiro, the HTTP Security Headers are properly added/modified by the doFilter() function before the servlet action is done. However, with Apache Shiro (that chains additional filters), the order of filters execution appears to be changed and, in this case, the response is already committed before Tomcat has the chance to add the necessary HTTP Security Headers, which triggers this error. As far as I know, this issue doesn’t happen with OTDS SSO, so I assume different libraries are in play there.

Since this issue can be seen as soon as the Tomcat HTTP Security Headers are enabled (even if you don’t specify any configuration for the filter), then it kind of makes it difficult to setup it up at the Tomcat layer. D2 20.2 uses Apache Shiro 1.4.2 (Nov-2019) while the latest version, as of today, is 1.9.1 (Jun-2022). I didn’t check but this issue might have been solved in a later version. Therefore, potential solutions could be to update the libraries (knowing that it could bring issues as well…) or simply to NOT configure the HTTP Security Headers on Tomcat, but instead put them one level above. If you are using Tomcat, there is a good chance that you have a front-end on top of it like Apache HTTPD or Nginx. The issue being related to Tomcat servlet handling, configuring/adding the HTTP Security Headers upfront will prevent the issue to appear altogether, hence that would most probably be the preferred solution.

L’article Documentum – RED Warning on D2 after enabling Tomcat HTTP Security Headers est apparu en premier sur dbi Blog.


Subscribe to Oracle FAQ aggregator