ODA: Performance Impact due to Flashback Database

Here and there I am planning and actually doing migrations to the Oracle Database Appliance (ODA). I like this appliance because of it’s relative simplicity and ease of management. In most cases we connect two ODAs in different locations using DataGuard with the goal of maximizing availability. That means we have a primary 2-node-RAC on ODA-1 and another 2-node-RAC on ODA-2 which operates as standby.
In order to fulfill the requirements of availability and fault tolerance the databases are configured as “Maximum Availability”:

  • sufficient Standby Redologs configured – on the dedicated Redo SSDs
  • Force Logging mode
  • Flashback Database configured and activated – located in FRA per definition

In terms of availability and manageability this is quite good, but in terms of performance this turned out to be really bad sometimes. Look at this AWR snippets:

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:     88808 01-Sep-15 17:30:06       480       2.7
  End Snap:     88855 02-Sep-15 17:00:10       478       6.8
   Elapsed:            1,410.06 (mins)
   DB Time:            3,311.41 (mins)
Wait Classes by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                        Avg             Avg
                                        Total Wait     Wait   % DB   Active
Wait Class                  Waits       Time (sec)     (ms)   time Sessions
---------------- ---------------- ---------------- -------- ------ --------
System I/O             11,522,327          161,304       14   81.2      1.9
DB CPU                                     101,562            51.1      1.2
User I/O               19,956,840           88,663        4   44.6      1.0
Background Wait Events                 DB/Inst: XXX/xxx  Snaps: 88808-88855
-> ordered by wait time desc, waits desc (idle events last)
-> Only events with Total Wait Time (s) >= .001 are shown
-> %Timeouts: value of 0 indicates value was > .5%.  Value of null is truly 0

                                                             Avg                
                                        %Time Total Wait    wait    Waits   % bg
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
db file async I/O submit        969,898     0     90,151      93      0.3   46.5
db file parallel write        5,015,605     0     33,520       7      1.3   17.3
flashback log file write        617,118     0     30,295      49      0.2   15.6
Backup: MML write backup p   10,608,828     0     11,727       1      2.7    6.0
RMAN backup & recovery I/O      288,934     0      3,508      12      0.1    1.8

Remember, the flashback logs are stored inside FRA. And where does ODA place the FRA? On the slower part of the spinning disks. The system can not be faster than the slowest part in the chain. And in this configuration the user I/Os are influenced by flashback I/Os making things even worse.
So turn off flashback database in order to improve performance.

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:     89528 16-Sep-15 17:30:45       393       5.0
  End Snap:     89575 17-Sep-15 17:00:44       576       5.2
   Elapsed:            1,409.98 (mins)
   DB Time:            2,792.69 (mins)
Wait Classes by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                        Avg             Avg
                                        Total Wait     Wait   % DB   Active
Wait Class                  Waits       Time (sec)     (ms)   time Sessions
---------------- ---------------- ---------------- -------- ------ --------
System I/O              7,424,187           97,029       13   57.9      1.1
DB CPU                                      96,595            57.6      1.1
User I/O                8,191,268           63,769        8   38.1      0.8

Only half the System I/O than with flashback database enabled.

Background Wait Events                 DB/Inst: XXX/xxx  Snaps: 89528-89575
-> ordered by wait time desc, waits desc (idle events last)
-> Only events with Total Wait Time (s) >= .001 are shown
-> %Timeouts: value of 0 indicates value was > .5%.  Value of null is truly 0

                                                             Avg                
                                        %Time Total Wait    wait    Waits   % bg
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
db file async I/O submit        531,487     0     73,471     138      0.1   57.5
db file parallel write        2,384,365     0     15,169       6      0.7   11.9
Backup: MML write backup p   10,121,315     0     11,769       1      2.8    9.2
RMAN backup & recovery I/O      197,080     0      4,906      25      0.1    3.8

The background wait for “flashback log file write” completely gone.

Keep that in mind when using Flashback Database. This will also apply to other hardware than ODA. Be careful which disks you use for FRA/flashback logs.

Advertisements

SQL Patches – what hints do they exactly use?

There are a couple of blog posts out there that describe how to inject hints into given SQL statements without modifying anything inside the code. One of these blog posts was written by the Optimizer Team. There is quite a lot of information of how to create SQL Patches and how to check the execution plans for the usage of these SQL Patches, but I wasn’t really able to find out where the details are stored inside the database.

Let’s walk through it. First things first, the base query:

SQL> conn scott/tiger
Connected.
SQL> select count(*) from emp;

  COUNT(*)
----------
        14

SQL> select * from table(dbms_xplan.display_cursor('g59vz2u4cu404', 0, 'allstats last'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------SQL_ID  g59vz2u4cu404, child number 0
-------------------------------------
select count(*) from emp

Plan hash value: 2937609675

--------------------------------------------
| Id  | Operation        | Name   | E-Rows |
--------------------------------------------
|   0 | SELECT STATEMENT |        |        |
|   1 |  SORT AGGREGATE  |        |      1 |
|   2 |   INDEX FULL SCAN| PK_EMP |     14 |
--------------------------------------------

Note
-----
   - Warning: basic plan statistics not available. These are only collected when:
       * hint 'gather_plan_statistics' is used for the statement or
       * parameter 'statistics_level' is set to 'ALL', at session or system level

You see, I executed a query and wanted to see the exact execution statistics. Since they are not gathered, the DBMS_XPLAN output notifies me with the information how to get these information.
Assuming that we are not able to change the code to introduce the hint, I create a SQL Patch.

SQL> conn / as sysdba
Connected.
SQL> declare
  1    txt varchar2(2000);
  2  begin
  3    select sql_fulltext into txt
  4    from v$sql where sql_id = 'g59vz2u4cu404';
  5    dbms_sqldiag_internal.i_create_patch(
  6      sql_text => txt,
  7      hint_text => 'gather_plan_statistics',
  8      name => 'my_patch'
  9    );
 10  end;
 11  /

PL/SQL procedure successfully completed.

I used V$SQL to get the exact SQL text for creating the SQL patch.

Now the query is using my newly created SQL patch:

SQL> conn scott/tiger
Connected.
SQL> select count(*) from emp;

  COUNT(*)
----------
        14

SQL> select * from table(dbms_xplan.display_cursor('g59vz2u4cu404', 0, 'allstats last'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------SQL_ID  g59vz2u4cu404, child number 0
-------------------------------------
select count(*) from emp

Plan hash value: 2937609675

-------------------------------------------------------------------------------------
| Id  | Operation        | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |        |      1 |        |      1 |00:00:00.01 |       1 |
|   1 |  SORT AGGREGATE  |        |      1 |      1 |      1 |00:00:00.01 |       1 |
|   2 |   INDEX FULL SCAN| PK_EMP |      1 |     14 |     14 |00:00:00.01 |       1 |
-------------------------------------------------------------------------------------

Note
-----
   - SQL patch "my_patch" used for this statement

But this does not show what the patch exactly does. There is a view DBA_SQL_PATCHES that has some information, but nothing about the hints in these patches.

SQL> desc dba_sql_patches
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 NAME                                      NOT NULL VARCHAR2(128)
 CATEGORY                                  NOT NULL VARCHAR2(128)
 SIGNATURE                                 NOT NULL NUMBER
 SQL_TEXT                                  NOT NULL CLOB
 CREATED                                   NOT NULL TIMESTAMP(6)
 LAST_MODIFIED                                      TIMESTAMP(6)
 DESCRIPTION                                        VARCHAR2(500)
 STATUS                                             VARCHAR2(8)
 FORCE_MATCHING                                     VARCHAR2(3)
 TASK_ID                                            NUMBER
 TASK_EXEC_NAME                                     VARCHAR2(128)
 TASK_OBJ_ID                                        NUMBER
 TASK_FND_ID                                        NUMBER
 TASK_REC_ID                                        NUMBER

When looking at the definition of that view, we see a join between “sqlobj$” and “sqlobj$auxdata”. And the latter one has a column “COMP_DATA” which contains everything I wanted to know in XML format. To extract this information I use queries like this. First identify the SQL signature and then use this signature to get the SQL patch information.

SQL> conn / as sysdba
Connected.
SQL> set numformat 99999999999999999999999999
SQL> select EXACT_MATCHING_SIGNATURE from v$sql where sql_id = 'g59vz2u4cu404';

   EXACT_MATCHING_SIGNATURE
---------------------------
        2640606212120450132

SQL> select cast(extractvalue(value(x), '/hint') as varchar2(500)) as outline_hints
  1  from   xmltable('/outline_data/hint'
  2         passing (select xmltype(comp_data) xml
  3                  from   sys.sqlobj$data
  4                  where  signature = 2640606212120450132)) x;

OUTLINE_HINTS
--------------------------------------------------------------------------------
gather_plan_statistics

That’s it.

Finally, I get rid of the SQL patch by dropping it.

SQL> exec dbms_sqldiag.drop_sql_patch(name => 'my_patch');

PL/SQL procedure successfully completed.

Nice feature that we got, but remember that the use of DBMS_SQLDIAG package may require additional licensing. How that helps anyway.

High parse time in Oracle 12c

One of our customers moved their main application to Oracle Database 12c a couple of days ago. With the new version they experiences a couple of issues that I helped them to resolve or to work around and that I will write about in this and other blog posts.

The first and very heavy issue appeared right from the start. When several people started using the application they waited a long time at a specific point. What we saw at database level was a high number of “Concurrency” waits, namely “Cursor: pin S wait on X”. There must be something, these queries run fine in 11.2.0.4. So we first set “optimizer_features_enable” step by step back to “11.2.0.2”, but it did not change anything. It continued to look like this:

12c-start-work

And this was only the beginning… It went on:

12c-race-condition

This all was caused by only a few statements. It turned out, that during the concurrency periods V$SQL showed a value of zero for PLAN_HASH_VALUE for the particular statements and child cursors. The parsing of the statement took that long, up to five minutes per parse. All the other sessions that were about to run the same statement waited for the first session to complete the parse. And this happend over and over again, either due to new child cursors being created or due to the cursors being aged out of the shared pool.

Currently there as a SR open with Oracle to identify the reason for these extremly long parse times. I will update this post as soon as a I get news.

Meanwhile the workaround is to simply pin the statements in the shared pool. This way we do not need to parse over and over again, we keep all information in shared pool regardless if it is used or not. We still need to do the parse for each child cursor that is needed, but then we are done. So there is a time after restarting a database instance when the parse needs to be done. During this time we identify the statements and pin them again.
The following are examples only, it has nothing to do with the real issue.

SQL> select sql_id, executions, plan_hash_value, address, hash_value from v$sql where sql_id='94qn6y14kw01g';

SQL_ID        EXECUTIONS PLAN_HASH_VALUE ADDRESS          HASH_VALUE
------------- ---------- --------------- ---------------- ----------
94qn6y14kw01g        255      1388734953 00000000CE3DB1E8 1227751471

We use ADDRESS and HASH_VALUE to pin the cursor:

SQL> exec dbms_shared_pool.keep('00000000CE3DB1E8,1227751471', 'C');

PL/SQL procedure successfully completed.

To identify the statements that are already pinned, use a query similar to this:

SQL> select name, executions, status,
  2  (select sql_id from v$sql s where s.hash_value = c.hash_value and rownum=1) sql_id
  3  from v$db_object_cache c
  4  where kept='YES' and type='CURSOR';

NAME
--------------------------------------------------------------------------------
EXECUTIONS STATUS              SQL_ID
---------- ------------------- -------------
SELECT NVL(TO_NUMBER(EXTRACT(XMLTYPE(:B2 ), :B1 )), 0) FROM DUAL
       273 VALID               94qn6y14kw01g

SELECT NVL(TO_NUMBER(EXTRACT(XMLTYPE(:B2 ), :B1 )), 0) FROM DUAL
       273 VALID               94qn6y14kw01g

After a while, to load reduced and then continued at normal level.

12c-normal-load

I am now waiting for Oracle Support to provide information about the root cause and for the fix of this issue.

Did you ever see “enq: CF – contention”?

I did. On a customers database we saw massive waits for “enq: CF – contention”. The sessions waiting for that event where all executing simple INSERT statements. So why do they wait for controlfile operations?
My Oracle Support Note 1380939.1 pointed me to the root cause. The INSERTs are doing NOLOGGING operations. And hence the database updates the UNCRECOVERABLE_TIME and UNRECOVERABLE_SCN information inside the controlfile over and over again. This serialization leads to massive performance degradation.
But who is doing NOLOGGING operations? That is bad in terms of recoverability and in our special case really really bad from a performance perspective. We started investigating this. The database runs in NO FORCE LOGGING mode and all tablespaces are LOGGING too:

SQL> select force_logging from v$database;

FOR
---
NO

SQL> select distinct logging from dba_tablespaces where contents='PERMANENT';

LOGGING
---------
LOGGING

Let’s check the logging attribute of the table:

SQL> select logging from dba_tables where table_name='TABLE_IN_QUESTION';

LOG
---

SQL> select partitioned,logging from dba_tables where table_name='TABLE_IN_QUESTION';

PAR LOG
--- ---
YES

Obviously the table is partitioned so I needed to check the partitions itself:

SQL> select distinct logging from dba_tab_partitions where table_name='TABLE_IN_QUESTION';

LOGGING
-------
YES

So why are there nologging operations? The logging attribute is for segments, so I checked for LOB columns in the table.

SQL> select column_name, logging, partitioned from dba_lobs  where table_name='TABLE_IN_QUESTION';

COLUMN_NAME          LOGGING PAR
-------------------- ------- ---
VALUE                NONE    YES

SQL> select distinct column_name, logging from dba_lob_partitions  where table_name='TABLE_IN_QUESTION';

COLUMN_NAME          LOGGING
-------------------- -------
VALUE                NO

That’s it. The LOB partitions are defined to do NOLOGGING operations. But we were not able to change that:

SQL> alter table TABLE_IN_QUESTION modify lob (value) (nocache logging);
alter table TABLE_IN_QUESTION modify lob (value) (nocache logging)
                   *
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

And of cause the application owner, who was complaining most about bad performance by the way, did not commit to switch the data loading off for a short period of time. You know, availability, etc bla bla… So we just enabled the FORCE LOGGING mode for the database which immediately removed the “enq: CF – contention”.

SQL> alter database force logging;

Database altered.

That is what Cloud Control looked like…
CF contention gone

But we need to keep in mind that some time when we are allowed to, we must change the LOB parameters.