Shared Pool Memory Leak in 12c RAC

A couple of minutes ago I wrote about Parse Times in 12c. When investigating this issue we came across another one. The shared pool was growing and growing, stealing a lot of memory from the buffer cache. We stopped that by setting a minimum size for “db_cache_size” to stop shrinking the buffer cache. It is not that there isn’t enough memory, the MEMORY_TARGET was set to 42GB (the magic number). But the shared pool was consuming more than 20GB of that memory. We stopped the buffer cache from shrinking more and more at around 8GB. Now the shared pool started to steal it’s memory from the PGA.
So we started to identify what area is consuming the memory inside the shared pool:

SQL> select * from (
  2  select name, bytes from v$sgastat where pool ='shared pool' order by 2 desc
  3  )
  4  where rownum <11;

NAME                                      BYTES
-------------------------- --------------------
ges enqueues                      5.981.720.032
ges resource dynamic              5.568.662.280
value block free list             2.283.277.296
ges resource dynamic              1.799.279.656
free memory                         983.839.992
gcs resources                       696.864.960
SQLA                                534.089.824
KGH: NO ACCESS                      399.729.056
gcs shadows                         387.147.200
KGLH0                               268.811.064

Ok, ges things… A quick search in My Oracle Support revealed a bug which exactly met our findings ORA-04031 Errors Occurring with High “ges resource dynamic” & “ges enqueues” Memory Usage In The Shared Pool.

The workaround mentioned in the document helped, now the shared pool is quiet again.

High parse time in Oracle 12c

One of our customers moved their main application to Oracle Database 12c a couple of days ago. With the new version they experiences a couple of issues that I helped them to resolve or to work around and that I will write about in this and other blog posts.

The first and very heavy issue appeared right from the start. When several people started using the application they waited a long time at a specific point. What we saw at database level was a high number of “Concurrency” waits, namely “Cursor: pin S wait on X”. There must be something, these queries run fine in 11.2.0.4. So we first set “optimizer_features_enable” step by step back to “11.2.0.2”, but it did not change anything. It continued to look like this:

12c-start-work

And this was only the beginning… It went on:

12c-race-condition

This all was caused by only a few statements. It turned out, that during the concurrency periods V$SQL showed a value of zero for PLAN_HASH_VALUE for the particular statements and child cursors. The parsing of the statement took that long, up to five minutes per parse. All the other sessions that were about to run the same statement waited for the first session to complete the parse. And this happend over and over again, either due to new child cursors being created or due to the cursors being aged out of the shared pool.

Currently there as a SR open with Oracle to identify the reason for these extremly long parse times. I will update this post as soon as a I get news.

Meanwhile the workaround is to simply pin the statements in the shared pool. This way we do not need to parse over and over again, we keep all information in shared pool regardless if it is used or not. We still need to do the parse for each child cursor that is needed, but then we are done. So there is a time after restarting a database instance when the parse needs to be done. During this time we identify the statements and pin them again.
The following are examples only, it has nothing to do with the real issue.

SQL> select sql_id, executions, plan_hash_value, address, hash_value from v$sql where sql_id='94qn6y14kw01g';

SQL_ID        EXECUTIONS PLAN_HASH_VALUE ADDRESS          HASH_VALUE
------------- ---------- --------------- ---------------- ----------
94qn6y14kw01g        255      1388734953 00000000CE3DB1E8 1227751471

We use ADDRESS and HASH_VALUE to pin the cursor:

SQL> exec dbms_shared_pool.keep('00000000CE3DB1E8,1227751471', 'C');

PL/SQL procedure successfully completed.

To identify the statements that are already pinned, use a query similar to this:

SQL> select name, executions, status,
  2  (select sql_id from v$sql s where s.hash_value = c.hash_value and rownum=1) sql_id
  3  from v$db_object_cache c
  4  where kept='YES' and type='CURSOR';

NAME
--------------------------------------------------------------------------------
EXECUTIONS STATUS              SQL_ID
---------- ------------------- -------------
SELECT NVL(TO_NUMBER(EXTRACT(XMLTYPE(:B2 ), :B1 )), 0) FROM DUAL
       273 VALID               94qn6y14kw01g

SELECT NVL(TO_NUMBER(EXTRACT(XMLTYPE(:B2 ), :B1 )), 0) FROM DUAL
       273 VALID               94qn6y14kw01g

After a while, to load reduced and then continued at normal level.

12c-normal-load

I am now waiting for Oracle Support to provide information about the root cause and for the fix of this issue.