MERGE burns Sequence Numbers

In my previous blog post I wrote about the MERGE statement and how it can efficiently be used. During the implementation of this solution a colleague stumbled upon some very high values numbers for the sequence he used to generate the IDs. For sure I investigated this and here’s the result. First I start with a table that gets its IDs from a default.

SQL> create sequence tm_seq;

Sequence created.

SQL> create table test_merge (
  2   id  number default on null tm_seq.nextval,
  3   txt varchar2(30),
  4   last_updated date,
  5   constraint pk_test_merge primary key (id)
  6  )
  7  /

Table created.

SQL> select LAST_NUMBER from user_sequences where SEQUENCE_NAME='TM_SEQ';

LAST_NUMBER
-----------
          1

When I now insert the first records into that table, the IDs get generated automatically using the Sequence.

SQL> merge into test_merge
  2  using (
  3    select 'Zeile ' || to_char(level) zeile from dual connect by level  select LAST_NUMBER from user_sequences where SEQUENCE_NAME='TM_SEQ';

LAST_NUMBER
-----------
     100001

At the end the Sequence increased by exactly the amount of records I inserted. But let’s see what happens when I run the same MERGE again. It will now do UPDATEs instead of INSERTs.

SQL> merge into test_merge
  2  using (
  3    select 'Zeile ' || to_char(level) zeile from dual connect by level  select LAST_NUMBER from user_sequences where SEQUENCE_NAME='TM_SEQ';

LAST_NUMBER
-----------
     200001

Look, the Sequence increased again by exactly the amount of records I tried to MERGE. Obviously the ID values are created regardless of the operation that is finally performed. Even if I update the same row multiple times, wich ends up throwing an error of cause, these Sequence numbers get burned.

SQL> merge into test_merge
  2  using (
  3    select 'Zeile ' || to_char(mod(level,10000)) zeile from dual connect by level  select LAST_NUMBER from user_sequences where SEQUENCE_NAME='TM_SEQ';

LAST_NUMBER
-----------
     700021

Next, It replaced the DEFAULT with an IDENTITY column. But the result is the same.

SQL> create table test_merge (
  2   id  number generated by default on null as identity,
  3   txt varchar2(30),
  4   last_updated date,
  5   constraint pk_test_merge primary key (id)
  6  )
  7  /

Table created.

SQL> select sequence_name from user_tab_identity_cols where table_name='TEST_MERGE';

SEQUENCE_NAME
------------------------------------------------------------------------------------------------------------------
ISEQ$$_1193040

SQL> select LAST_NUMBER from user_sequences where SEQUENCE_NAME='ISEQ$$_1193040';

LAST_NUMBER
-----------
          1

SQL> merge into test_merge
  2  using (
  3    select 'Zeile ' || to_char(level) zeile from dual connect by level  merge into test_merge
  2  using (
  3    select 'Zeile ' || to_char(level) zeile from dual connect by level  select LAST_NUMBER from user_sequences where SEQUENCE_NAME='ISEQ$$_1193040';

LAST_NUMBER
-----------
     200001

Last try, I switch to the Trigger-based solution that I created initially.

SQL> create sequence te_seq;

Sequence created.

SQL> create or replace trigger tm_bir
  2  before insert on test_merge
  3  for each row
  4  begin
  5    :new.id := tm_seq.nextval;
  6  end;
  7  /

Trigger created.

SQL> merge into test_merge
  2  using (
  3    select 'Zeile ' || to_char(level) zeile from dual connect by level  select LAST_NUMBER from user_sequences where SEQUENCE_NAME='TM_SEQ';

LAST_NUMBER
-----------
     100001

SQL> merge into test_merge
  2  using (
  3    select 'Zeile ' || to_char(level) zeile from dual connect by level  select LAST_NUMBER from user_sequences where SEQUENCE_NAME='TM_SEQ';

LAST_NUMBER
-----------
     100001

You see, with the Trigger generating the IDs there is no burning of Sequence values. This is because the Trigger fires, when the INSERT is really happening. It cannot asume that there may be an INSERT. With the first approaches using DEFAULT or IDENTITY the IDs get generated right away, just in case. But this a cosmetic thing, those IDs are just IDs, nobody should ever see or even care about this values. At the end, the Trigger-based solution is a bit slower than the other two. Here the overhead of the PL/SQL is measureable.

Merge vs. Insert-Exception-Update

Recently I had a discussion with one of our developer about how to bring new rows in a table or, if the row already exists, update it. His approach was a PL/SQL procedure that first tries to insert a single row and if that fails because the row is already there, catch that exception and do an update instead. I told him, that this approach has at least three issues, that make it slow. First, it does things row-by-row, this is much more inefficient than set operations and it does not scale well. Second, it has lots of context switches between SQL and PL/SQL which are costly. And third, the exception handling is expensive too. My advise was to switch the code to use a single MERGE-statement. And to prove that I am right, I wrote a small example which I’ll share with you.

Let’s create two tables with identical structure, both will have an unique index and a trigger, that fills the ID using a sequence.

SQL> create unique index tm_txt_uq on test_merge(txt);

Index created.

SQL> create table test_except (
  2   id  number,
  3   txt varchar2(30),
  4   last_updated date,
  5   constraint pk_test_except primary key (id)
  6  )
  7  /

Table created.

SQL> create unique index te_txt_uq on test_except(txt);

Index created.

SQL> create sequence tm_seq;

Sequence created.

SQL> create sequence te_seq;

Sequence created.

SQL> create or replace trigger tm_bir
  2  before insert on test_merge
  3  for each row
  4  begin
  5    :new.id := tm_seq.nextval;
  6  end;
  7  /

Trigger created.

SQL> create or replace trigger te_bir
  2  before insert on test_except
  3  for each row
  4  begin
  5    :new.id := te_seq.nextval;
  6  end;
  7  /

Trigger created.

In the first run I populated both tables with 100k rows, since both tables are empty, this will lead to INSERTs of cause. And SQL*Plus will do the timing for me.

SQL> set timing on
SQL> merge into test_merge
  2  using (
  3    select 'Zeile ' || to_char(level) zeile from dual connect by level  set serveroutput on
SQL> declare
  2    l number;
  3    i number :=0;
  4    u number :=0;
  5  begin
  6    for l in 1..100000 loop
  7      begin
  8        insert into test_except (txt)
  9    values ('Zeile ' || to_char(l));
 10    i := i+1;
 11  exception
 12    when dup_val_on_index then
 13      update test_except set last_updated=sysdate where txt = 'Zeile ' || to_char(l);
 14      u := u+1;
 15      end;
 16    end loop;
 17    dbms_output.put_line(to_char(i) || ' rows inserted.');
 18    dbms_output.put_line(to_char(u) || ' rows updated.');
 19  end;
 20  /
100000 rows inserted.
0 rows updated.

PL/SQL procedure successfully completed.

Elapsed: 00:00:14.62

We see, that the MERGE is more than 2x faster than the PL/SQL approach. When I now run both statements a scond time, it will lead to UPDATEs since all the rows are already in the tables.

SQL> merge into test_merge
  2  using (
  3    select 'Zeile ' || to_char(level) zeile from dual connect by level  declare
  2    l number;
  3    i number :=0;
  4    u number :=0;
  5  begin
  6    for l in 1..100000 loop
  7      begin
  8        insert into test_except (txt)
  9    values ('Zeile ' || to_char(l));
 10    i := i+1;
 11      exception
 12    when dup_val_on_index then
 13      update test_except set last_updated=sysdate where txt = 'Zeile ' || to_char(l);
 14      u := u+1;
 15      end;
 16    end loop;
 17    dbms_output.put_line(to_char(i) || ' rows inserted.');
 18    dbms_output.put_line(to_char(u) || ' rows updated.');
 19  end;
 20  /
0 rows inserted.
100000 rows updated.

PL/SQL procedure successfully completed.

Elapsed: 00:00:55.36

The MERGE statement is even faster compared to the first run. That might be due to the fact, that there is less internal work to do, no index block splits etc. But the elapsed time of the PL/SQL aproach dicreased dramatically, it took 4x longer than it’s initial run and is now 11x slower then the MERGE.
So now it is really obvious, that the PL/SQL approach might not be the best performing solution. The MERGE solution maybe needs some more thinking and might not readable as the PL/SQL solution, but if done regularly one should get used to it.
The good thing is, I convinced my collegue.

Transporting SQL Patches Between Databases

A while ago I wrote about SQL Patches – what hints do they exactly use? Now I came to the point where the underlying application was moved to another database using Data Pump. And consequently some parts of the application started to run into performance issues again because the SQL patches, that made some statements run fast, were not transported to the new database. So the need to re-create or transport those SQL patches arrised.

SQL patches are not part of the application schema, they are stored in the data dictionary of the database. That is the cause why they are not transported when using Data Pump. But there is a way to transport them. The DBMS_SQLDIAG package provides some procedures for this, CREATE_STGTAB_SQLPATCH, PACK_STGTAB_SQLPATCH and UNPACK_STGTAB_SQLPATCH. This listing represents the order in which these functions are used. Basically the procedure to move SQL patches is this:

  1. Create staging table in a schema other than SYS
  2. Import SQL patches (all or some) into this staging table
  3. Transport staging table to destination database (Data Pump, CTAS over DB-Link will not work since there is a LONG column in the staging table)
  4. Extract SQL patches (all or some) from staging table back into the dictionary

Let’s see how this works, first extract the SQL patches.

SQL> select name from dba_sql_patches;

NAME
--------------------------------------------------------------------------------
Patch_2
Patch_5
Patch_3
Patch_4
Patch_1

SQL> exec dbms_sqldiag.create_stgtab_sqlpatch('SQLPATCH_STAGE','SYSTEM');

PL/SQL procedure successfully executed.

SQL>  exec  dbms_sqldiag.pack_stgtab_sqlpatch(staging_table_name=>'SQLPATCH_STAGE', staging_schema_owner=>'SYSTEM');

PL/SQL procedure successfully executed.

Now move the staging table to the destination database.

[oracle ~]$ expdp system/**** tables=system.sqlpatch_stage directory=dings dumpfile=sqlpatch.dmpdp

[oracle ~]$ impdp system/**** directory=dings dumpfile=sqlpatch.dmpdp full=y

Finally, extract the SQL patches back into the data dictionary.

SQL>  exec dbms_sqldiag.unpack_stgtab_sqlpatch(replace=>true, staging_table_name=>'SQLPATCH_STAGE', staging_schema_owner=>'SYSTEM');

PL/SQL procedure successfully executed.

That’s it. Nothing more to do. Keep that in mind in case your applications are fine tuned using SQL patches and you need to move them to different databases.

Oracle SE2 and Instance Caging

At DOAG Exaday 2018 Martin Bach talked about using the Resource Manager. He explained, that Resource Manager is using the CPU_COUNT initialization parameter to limit the CPU utilization of a database instance. This is called instance caging and requires an Enterprise Edition License. Resource Manager is not available in Standard Edtion 2 (SE2) according to the Licensing Guide:

se2_resmgr_cpucount_licguide

On the other hand, SE2 is limited to 16 threads per database. Franck Pachot did investigate this in his blog post and found out, that internally Resource Manager features are being used to enforce this limitation.

So the question came up, what will happen to a SE2 Oracle Database that has CPU_COUNT set. According to the documentation, CPU_COUNT works only with Resource Manager which is not part of SE2, but SE2 is using Resource Manager internally.

Now, let’s try. I did use the same method to generate load that Franck Pachot used for his tests. A simple PL/SQL block running in several database jobs.  For testing this, I set CPU_COUNT to 4 and did run 10 parallel jobs. Having this, my workload is definitely below the internal maximum of 16 threads. To measure the workload I used top, oratop and Statspack, database version was 12.2.0.1.171017.

And these are the results. My server (ODA X7-2S, 1 CPU, 10 cores hyperthreaded) had an utilization of roughly 20% plus a little overhead. Having 20 cores in the OS and a CPU_COUNT of 4, I end up with a maximum of 1/5th of the server that I can utilize. In other words: 20%. This is what “top” showed:

se2_resmgr_cpucount_jobs_top

To see, what the database instance is actually doing, I used “oratop”:

se2_resmgr_cpucount_jobs_oratop

You see, there are 4 session on CPU and some others waiting for Resource Manager. That proofes, that SE2 internal Resource Manager is using the CPU_COUNT parameter to limit the utilization.  Finally, let’s check the Statspack report:

sp_2_3_jobs

In this overview you can see the 10 sessions that are trying to do something (DB time) and the 4 sessions that are actually doing some work on CPU (DB CPU).

Conclusion / Disclaimer

Instance caging does work in a SE2 Oracle Database. You can limit the database to use even less than 16 threads. But the fact that this works does not necessarily mean that it is allowed. So use in case you use this featuer, you do that on your own risk.

Edit: Dominic Giles stated in the Twitter discussion that it is allowed to do instance caging in SE2.

se2_resmgr_cpucount_giles

Benchmarking Data Guard Throughput

Motivation

When planning a new Oracle Data Guard setup, we do a lot of benchmarking beforehand. We do I/O benchmarks using Orion, or application benchmarks like SwingBench or SLOB. For network bandwith and throughput there is iPerf. But iPerf tells us only half the truth. It just measures the network. But in a Data Guard setup with synchronous redo transport there is more. The receiving site needs to write the information to disk and after successful completion acknowledge back to the sender.

Oracle provides a tool called “OraTCPTest” for that purpose which is pretty similar to iPerf but with additional capability to reflect the behaviour of Data Guard. The tool and basic instructions can be found in My Oracle Support document “Measuring Network Capacity using oratcptest (Doc ID 2064368.1)“.

Basically we start a server process at the receiving site, that optionally can write the received information to disk, and another client process at the sending site. Let’s walk through a basic example. I did this on Windows boxes, it works even there 🙂

1. Start the server process

We just need to specify the port which should be used for listening.

D:\install>java -jar oratcptest.jar -server -port=4711
OraTcpTest server started.

 
There is nothing more than that.

2. Start the client process

Here we need some more parameters, the target, the port and some timing stuff.

D:\install>java -jar oratcptest.jar myotherserver -port=4711 -duration=10s -interval=2s
[Requesting a test]
        Message payload        = 1 Mbyte
        Payload content type   = RANDOM
        Delay between messages = NO
        Number of connections  = 1
        Socket send buffer     = (system default)
        Transport mode         = SYNC
        Disk write             = NO
        Statistics interval    = 2 seconds
        Test duration          = 10 seconds
        Test frequency         = NO
        Network Timeout        = NO
        (1 Mbyte = 1024x1024 bytes)

(15:05:55) The server is ready.
                        Throughput                 Latency
(15:05:57)        101.832 Mbytes/s                9.820 ms
(15:05:59)        102.159 Mbytes/s                9.789 ms
(15:06:01)        102.169 Mbytes/s                9.788 ms
(15:06:03)        101.954 Mbytes/s                9.808 ms
(15:06:05)        101.998 Mbytes/s                9.804 ms
(15:06:05) Test finished.
               Socket send buffer = 64 kbytes
                  Avg. throughput = 101.968 Mbytes/s
                     Avg. latency = 9.807 ms

And that’s it. You can see all the parameters that were used and which values these have. The output then tells us the performance details, the average throughput and latency.

Finding the truth

Assuming we are migrating an existing system, we need to know if the network can handle the amount of redo that needs to be shipped. AWR or Statspack reports tells us these numbers, but they are averages. Peaks might be masked in those reports. Using the information in V$ARCHIVED_LOG is much more accurate.

Let’s look at a timeframe during daytime workload and find the maximum redo rate.

select
  thread#,
  sequence#,
  blocks*block_size/1024/1024 MB,
  (next_time-first_time)*86400 sec,
  (blocks*block_size/1024/1024)/((next_time-first_time)*86400) "MB/s"
from
  v$archived_log
where
  ( (next_time-first_time)*86400<>0)
and
  first_time between
    to_date('2016/11/22 08:00:00','YYYY/MM/DD HH24:MI:SS') and
    to_date('2016/11/22 16:00:00','YYYY/MM/DD HH24:MI:SS')
and
  dest_id=2
order by 5;

   THREAD#  SEQUENCE#         MB        SEC       MB/s
---------- ---------- ---------- ---------- ----------
         1     349010 226.600586        453 .500222044
         1     348976 226.943848        415 .546852645
[...]
         1     349048 240.336914         56 4.29173061
         1     348953  240.79248         53 4.54325435
         1     349049 246.279785         47 5.23999543
         1     348987 228.593262         37 6.17819626

So at peak workloads one redo file of roughly 230MB is written in 37 seconds which results in 6-7 MB/s. In the first simple example we achieved over 100MB/s, so this is easy, isn’t it? No, it isn’t. We will see in a moment. First we need to find out how large a typical redo write is. We can get these values from AWR reports or just query V$SYSTAT.

select
  (select value from v$sysstat where statistic# = (select statistic# from v$statname where name = 'redo size')) /
  (select value from v$sysstat where statistic# = (select statistic# from v$statname where name = 'redo writes')) "avg redo write size"
from
  dual;

avg redo write size
-------------------
         9399.55267

The typical redo write size is 9400 bytes in this case. Now we can start over and do the benchmark again with the propper parameters.

Real Life Benchmark

Again we start the server process, this time we specify a file which is writeable. Since the tool cannot write directly to ASM, I created an ACFS volume to get roughly the performance of my ASM disk that will host the Standby Redologs in the future environment.

D:\install>java -jar oratcptest.jar -server -port=4711 -file=d:\app\oracle\oradata\acfs\oratcptest.tmp
OraTcpTest server started.

Now we can start several tests against this server process. Let’s start with asynchronous transmission without writing the information at the receiving site.

D:\install>java -jar oratcptest.jar myotherserver -port=4711 -duration=10s -interval=2s -mode=async -length=9400
[Requesting a test]
        Message payload        = 9400 bytes
        Payload content type   = RANDOM
        Delay between messages = NO
        Number of connections  = 1
        Socket send buffer     = (system default)
        Transport mode         = ASYNC
        Disk write             = NO
        Statistics interval    = 2 seconds
        Test duration          = 10 seconds
        Test frequency         = NO
        Network Timeout        = NO
        (1 Mbyte = 1024x1024 bytes)

(08:42:53) The server is ready.
                        Throughput
(08:42:55)        112.814 Mbytes/s
(08:42:57)        112.731 Mbytes/s
(08:42:59)        112.641 Mbytes/s
(08:43:01)        112.622 Mbytes/s
(08:43:03)        112.665 Mbytes/s
(08:43:03) Test finished.
               Socket send buffer = 16 Mbytes
                  Avg. throughput = 112.642 Mbytes/s					 

Ok, 10MB/s more than the first test, we are transferring just 9kB per message instead of 1MB wich was the default.
Next test, again asynchronous transfer, but this time the server should write the message content to disk before replying.

D:\install>java -jar oratcptest.jar myotherserver -port=4711 -duration=10s -interval=2s -mode=async -write -length=9400
[Requesting a test]
        Message payload        = 9400 bytes
        Payload content type   = RANDOM
        Delay between messages = NO
        Number of connections  = 1
        Socket send buffer     = (system default)
        Transport mode         = ASYNC
        Disk write             = YES
        Statistics interval    = 2 seconds
        Test duration          = 10 seconds
        Test frequency         = NO
        Network Timeout        = NO
        (1 Mbyte = 1024x1024 bytes)

(08:42:34) The server is ready.
                        Throughput
(08:42:36)         25.230 Mbytes/s
(08:42:38)         26.655 Mbytes/s
(08:42:40)         27.600 Mbytes/s
(08:42:42)         27.578 Mbytes/s
(08:42:44)         27.603 Mbytes/s
(08:42:44) Test finished.
               Socket send buffer = 1 Mbyte
                  Avg. throughput = 26.922 Mbytes/s

The throughput went down massively. Only 27MB/s are left, that is 1/4 of the previous throughput that we achieved without writing the message payload. A massive impact.
So what will happen when we do the transfer in a synchronous manner as Data Guard will do with Maximum Availability? Let’s see.

D:\install>java -jar oratcptest.jar myotherserver -port=4711 -duration=10s -interval=2s -mode=sync -write -length=9400
[Requesting a test]
        Message payload        = 9400 bytes
        Payload content type   = RANDOM
        Delay between messages = NO
        Number of connections  = 1
        Socket send buffer     = (system default)
        Transport mode         = SYNC
        Disk write             = YES
        Statistics interval    = 2 seconds
        Test duration          = 10 seconds
        Test frequency         = NO
        Network Timeout        = NO
        (1 Mbyte = 1024x1024 bytes)

(08:44:28) The server is ready.
                        Throughput                 Latency
(08:44:30)         15.082 Mbytes/s                0.595 ms
(08:44:32)         15.959 Mbytes/s                0.562 ms
(08:44:34)         16.402 Mbytes/s                0.547 ms
(08:44:36)         16.603 Mbytes/s                0.541 ms
(08:44:38)         16.579 Mbytes/s                0.541 ms
(08:44:38) Test finished.
               Socket send buffer = 64 kbytes
                  Avg. throughput = 16.117 Mbytes/s
                     Avg. latency = 0.557 ms

Again a degradation, there are only 16MB/s left. And remember, we’ll need 7MB/s during peak load. And beside that, we get the latency which will later impact the commit performance since the sending (primary) server will wait till the redo is written and acknowledged by the receiving (standby) server.
Just for the sake of completeness, the test with synchronous transfer but without writing.

D:\install>java -jar oratcptest.jar myotherserver -port=4711 -duration=10s -interval=2s -mode=sync -length=9400
[Requesting a test]
        Message payload        = 9400 bytes
        Payload content type   = RANDOM
        Delay between messages = NO
        Number of connections  = 1
        Socket send buffer     = (system default)
        Transport mode         = SYNC
        Disk write             = NO
        Statistics interval    = 2 seconds
        Test duration          = 10 seconds
        Test frequency         = NO
        Network Timeout        = NO
        (1 Mbyte = 1024x1024 bytes)

(08:44:11) The server is ready.
                        Throughput                 Latency
(08:44:13)         23.547 Mbytes/s                0.381 ms
(08:44:15)         31.800 Mbytes/s                0.282 ms
(08:44:17)         33.249 Mbytes/s                0.270 ms
(08:44:19)         32.639 Mbytes/s                0.275 ms
(08:44:21)         31.627 Mbytes/s                0.284 ms
(08:44:21) Test finished.
               Socket send buffer = 64 kbytes
                  Avg. throughput = 30.556 Mbytes/s
                     Avg. latency = 0.294 ms

It is somewhat faster than asnychronous transfer with writing.

Conclusion

Simple network bandwith tests are not sufficient to predict future Data Guard redo transfer performance. The OraTCPTest provides a nice opportunity to measure the throughput and get realistic values that can be used to plan and size the network. Again, here are the numbers of the different tests.

ASYNC nowrite ASYNC write SYNC nowrite SYNC write
112 MB/s 27 MB/s 32 MB/s 16 MB/s

We could now try to tune that more by playing around with send and receive buffer size. But for the time being, the defaults are sufficient for my case. So I skipped that.

ODA: Performance Impact due to Flashback Database

Here and there I am planning and actually doing migrations to the Oracle Database Appliance (ODA). I like this appliance because of it’s relative simplicity and ease of management. In most cases we connect two ODAs in different locations using DataGuard with the goal of maximizing availability. That means we have a primary 2-node-RAC on ODA-1 and another 2-node-RAC on ODA-2 which operates as standby.
In order to fulfill the requirements of availability and fault tolerance the databases are configured as “Maximum Availability”:

  • sufficient Standby Redologs configured – on the dedicated Redo SSDs
  • Force Logging mode
  • Flashback Database configured and activated – located in FRA per definition

In terms of availability and manageability this is quite good, but in terms of performance this turned out to be really bad sometimes. Look at this AWR snippets:

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:     88808 01-Sep-15 17:30:06       480       2.7
  End Snap:     88855 02-Sep-15 17:00:10       478       6.8
   Elapsed:            1,410.06 (mins)
   DB Time:            3,311.41 (mins)
Wait Classes by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                        Avg             Avg
                                        Total Wait     Wait   % DB   Active
Wait Class                  Waits       Time (sec)     (ms)   time Sessions
---------------- ---------------- ---------------- -------- ------ --------
System I/O             11,522,327          161,304       14   81.2      1.9
DB CPU                                     101,562            51.1      1.2
User I/O               19,956,840           88,663        4   44.6      1.0
Background Wait Events                 DB/Inst: XXX/xxx  Snaps: 88808-88855
-> ordered by wait time desc, waits desc (idle events last)
-> Only events with Total Wait Time (s) >= .001 are shown
-> %Timeouts: value of 0 indicates value was > .5%.  Value of null is truly 0

                                                             Avg                
                                        %Time Total Wait    wait    Waits   % bg
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
db file async I/O submit        969,898     0     90,151      93      0.3   46.5
db file parallel write        5,015,605     0     33,520       7      1.3   17.3
flashback log file write        617,118     0     30,295      49      0.2   15.6
Backup: MML write backup p   10,608,828     0     11,727       1      2.7    6.0
RMAN backup &amp; recovery I/O      288,934     0      3,508      12      0.1    1.8

Remember, the flashback logs are stored inside FRA. And where does ODA place the FRA? On the slower part of the spinning disks. The system can not be faster than the slowest part in the chain. And in this configuration the user I/Os are influenced by flashback I/Os making things even worse.
So turn off flashback database in order to improve performance.

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:     89528 16-Sep-15 17:30:45       393       5.0
  End Snap:     89575 17-Sep-15 17:00:44       576       5.2
   Elapsed:            1,409.98 (mins)
   DB Time:            2,792.69 (mins)
Wait Classes by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                        Avg             Avg
                                        Total Wait     Wait   % DB   Active
Wait Class                  Waits       Time (sec)     (ms)   time Sessions
---------------- ---------------- ---------------- -------- ------ --------
System I/O              7,424,187           97,029       13   57.9      1.1
DB CPU                                      96,595            57.6      1.1
User I/O                8,191,268           63,769        8   38.1      0.8

Only half the System I/O than with flashback database enabled.

Background Wait Events                 DB/Inst: XXX/xxx  Snaps: 89528-89575
-> ordered by wait time desc, waits desc (idle events last)
-> Only events with Total Wait Time (s) >= .001 are shown
-> %Timeouts: value of 0 indicates value was > .5%.  Value of null is truly 0

                                                             Avg                
                                        %Time Total Wait    wait    Waits   % bg
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
db file async I/O submit        531,487     0     73,471     138      0.1   57.5
db file parallel write        2,384,365     0     15,169       6      0.7   11.9
Backup: MML write backup p   10,121,315     0     11,769       1      2.8    9.2
RMAN backup &amp; recovery I/O      197,080     0      4,906      25      0.1    3.8

The background wait for “flashback log file write” completely gone.

Keep that in mind when using Flashback Database. This will also apply to other hardware than ODA. Be careful which disks you use for FRA/flashback logs.

SQL Patches – what hints do they exactly use?

There are a couple of blog posts out there that describe how to inject hints into given SQL statements without modifying anything inside the code. One of these blog posts was written by the Optimizer Team. There is quite a lot of information of how to create SQL Patches and how to check the execution plans for the usage of these SQL Patches, but I wasn’t really able to find out where the details are stored inside the database.

Let’s walk through it. First things first, the base query:

SQL> conn scott/tiger
Connected.
SQL> select count(*) from emp;

  COUNT(*)
----------
        14

SQL> select * from table(dbms_xplan.display_cursor('g59vz2u4cu404', 0, 'allstats last'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------SQL_ID  g59vz2u4cu404, child number 0
-------------------------------------
select count(*) from emp

Plan hash value: 2937609675

--------------------------------------------
| Id  | Operation        | Name   | E-Rows |
--------------------------------------------
|   0 | SELECT STATEMENT |        |        |
|   1 |  SORT AGGREGATE  |        |      1 |
|   2 |   INDEX FULL SCAN| PK_EMP |     14 |
--------------------------------------------

Note
-----
   - Warning: basic plan statistics not available. These are only collected when:
       * hint 'gather_plan_statistics' is used for the statement or
       * parameter 'statistics_level' is set to 'ALL', at session or system level

You see, I executed a query and wanted to see the exact execution statistics. Since they are not gathered, the DBMS_XPLAN output notifies me with the information how to get these information.
Assuming that we are not able to change the code to introduce the hint, I create a SQL Patch.

SQL> conn / as sysdba
Connected.
SQL> declare
  1    txt varchar2(2000);
  2  begin
  3    select sql_fulltext into txt
  4    from v$sql where sql_id = 'g59vz2u4cu404';
  5    dbms_sqldiag_internal.i_create_patch(
  6      sql_text => txt,
  7      hint_text => 'gather_plan_statistics',
  8      name => 'my_patch'
  9    );
 10  end;
 11  /

PL/SQL procedure successfully completed.

I used V$SQL to get the exact SQL text for creating the SQL patch.

Now the query is using my newly created SQL patch:

SQL> conn scott/tiger
Connected.
SQL> select count(*) from emp;

  COUNT(*)
----------
        14

SQL> select * from table(dbms_xplan.display_cursor('g59vz2u4cu404', 0, 'allstats last'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------SQL_ID  g59vz2u4cu404, child number 0
-------------------------------------
select count(*) from emp

Plan hash value: 2937609675

-------------------------------------------------------------------------------------
| Id  | Operation        | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |        |      1 |        |      1 |00:00:00.01 |       1 |
|   1 |  SORT AGGREGATE  |        |      1 |      1 |      1 |00:00:00.01 |       1 |
|   2 |   INDEX FULL SCAN| PK_EMP |      1 |     14 |     14 |00:00:00.01 |       1 |
-------------------------------------------------------------------------------------

Note
-----
   - SQL patch "my_patch" used for this statement

But this does not show what the patch exactly does. There is a view DBA_SQL_PATCHES that has some information, but nothing about the hints in these patches.

SQL> desc dba_sql_patches
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 NAME                                      NOT NULL VARCHAR2(128)
 CATEGORY                                  NOT NULL VARCHAR2(128)
 SIGNATURE                                 NOT NULL NUMBER
 SQL_TEXT                                  NOT NULL CLOB
 CREATED                                   NOT NULL TIMESTAMP(6)
 LAST_MODIFIED                                      TIMESTAMP(6)
 DESCRIPTION                                        VARCHAR2(500)
 STATUS                                             VARCHAR2(8)
 FORCE_MATCHING                                     VARCHAR2(3)
 TASK_ID                                            NUMBER
 TASK_EXEC_NAME                                     VARCHAR2(128)
 TASK_OBJ_ID                                        NUMBER
 TASK_FND_ID                                        NUMBER
 TASK_REC_ID                                        NUMBER

When looking at the definition of that view, we see a join between “sqlobj$” and “sqlobj$auxdata”. And the latter one has a column “COMP_DATA” which contains everything I wanted to know in XML format. To extract this information I use queries like this. First identify the SQL signature and then use this signature to get the SQL patch information.

SQL> conn / as sysdba
Connected.
SQL> set numformat 99999999999999999999999999
SQL> select EXACT_MATCHING_SIGNATURE from v$sql where sql_id = 'g59vz2u4cu404';

   EXACT_MATCHING_SIGNATURE
---------------------------
        2640606212120450132

SQL> select cast(extractvalue(value(x), '/hint') as varchar2(500)) as outline_hints
  1  from   xmltable('/outline_data/hint'
  2         passing (select xmltype(comp_data) xml
  3                  from   sys.sqlobj$data
  4                  where  signature = 2640606212120450132)) x;

OUTLINE_HINTS
--------------------------------------------------------------------------------
gather_plan_statistics

That’s it.

Finally, I get rid of the SQL patch by dropping it.

SQL> exec dbms_sqldiag.drop_sql_patch(name => 'my_patch');

PL/SQL procedure successfully completed.

Nice feature that we got, but remember that the use of DBMS_SQLDIAG package may require additional licensing. How that helps anyway.