DataPatch stuck on RAC – PSU October 2016

Yesterday one of my customers wanted to patch two 2-node clusters with the current PSU October 2016 (161018). Both are running Grid Infrastructure and Database. The servers run SPARC Solaris 10. When applying the patch on the first cluster using “opatchauto” everything went fine until the “trying to apply SQL Patch” part on the 2nd node. So I went to the log directory and found the following:

$ cd $ORACLE_BASE/cfgtoollogs/sqlpatch/sqlpatch_27075_2016_11_30_17_12_08
$ tail sqlpatch_catcon_0.log

SQL> GRANT SELECT ON sys.gv_$instance TO dv_secanalyst
  2  /

At that line it was stuck. Searching My Oracle Support brought up nothing helpful. So I had a look at the database sessions:

SQL> select sid, username, event, state, seconds_in_wait 
2    from v$session where username='SYS';

       SID USERNAME                       EVENT                                                            STATE                                                   SECONDS_IN_WAIT
---------- ------------------------------ ---------------------------------------------------------------- -----------                                    -------- ---------------
        13 SYS                            SQL*Net message from client                                      WAITING                                                             226
        30 SYS                            SQL*Net message from client                                      WAITING                                                             473
        32 SYS                            SQL*Net message to client                                        WAITED SHOR                                    T TIME                 0
       411 SYS                            SQL*Net message from client                                      WAITING                                                             473
       783 SYS                            library cache lock                                               WAITING                                                             211
       786 SYS                            SQL*Net message from client                                      WAITING                                                               4
      1155 SYS                            SQL*Net message from client                                      WAITING                                                             467

The session is waiting for something dictionary related. Since the waiting statement was related to RAC, I stopped the other instance which made sqlplatch continue immediately. So the workaround looked like this:

$ srvctl stop instance -db <dbname> -node <node1>
$ srvctl start instance -db <dbname> -node <node1>

This happened on both clusters. So be aware of that in case you are applying that PSU patch to RAC databases.
In case you missed to stop the 1st instance in time, the GRANT statement will run into a timeout (ORA-4021) and the SQL-Patch will be marked with “ERROR” in DBA_REGISTRY_SQLPATCH. In such case, just re-run “datapatch” again and monitor the logfile.
Happy patching.

Update 07-DEC-2016

I was not able to reproduce this issue on a Linux x86-64 system. So there is a chance that the issue is OS related.

Update 12-DEC-2016

Finally I reproduced this issue on my Linux x86-64 test system. Now I opened a SR for that.

Update 13-DEC-2016

Thanks to a quick and efficient Oracle Support guy (yes, there are such people!) we found the root cause of that issue. There is a bug in the Enterprise Manager Agent (DB Express maybe too) that it holds a shared lock on some GV$ views during the whole lifetime of a session. That’s why datapatch got stuck. If you just stop the Agent, datapatch will continue immediatly. There is no need to stop the whole instance. We just need to get rid of the Agent’s sessions.
Thanks a lot to Prakash from Oracle Support for his engagement in investigating this issue.

Using ASM Filter Driver right from the beginning


If you are running Oracle RAC, then configuring the shared storage is one of the main preinstallation tasks that needs to be done. You need to configure multipathing and make sure that the device name that will be used for ASM is always the same. And you must set permissions and ownership for these devices. On Linux you can use ASMlib for that. It stamps the devices so that it can identify them, provides an unique and consistent name for ASM and sets propper permissions for the devices. But it still possible for other processes to write to these devices, using “dd” for instance.

Now there is Oracle Grid Infrastructure 12c which introduces a replacement for ASMlib called ASM Filter Driver (AFD). Basically it does the same things as ASMlib but in addition to that it is able to block write operations from other processes than Oracle’s own ones.

So that is a good thing and I wanted to use it for a new cluster that I should set up. And that is where the trouble starts. Beside the fact that there were some bugs in the initial versions of AFD from which most got fixed by the April 2016 PSU, AFD is installed as part of Grid Infrastructure. You can read that in the Automatic Storage Management Docs. It states the following:

After installation of Oracle Grid Infrastructure, you can optionally configure Oracle ASMFD for your system.

What? After installation? But I need it right from the beginning to use it for my initial disk group. How about that? There is a MOS note How to Install ASM Filter Driver in a Linux Environment Without Having Previously Installed ASMLIB (Doc ID 2060259.1)  but this Whitepaper also asumes that Grid Infrastructure is already installed.

But as you can read from this blog posts title, there is a way to use AFD from scratch, but it is not really straight forward.

1. Install Grid Infrastructure Software

First step is to install Grid Infrastructure as a software only installation. That implies that you have to do it on all nodes that should form the future cluster. I did that on the first node, saved the response file and did a silent install on the other nodes.

[oracle@vm140 ~] ./runInstaller -silent -responseFile /home/oracle/stage/grid/grid.rsp -ignorePrereq

At the end of the installation you need to run the “” script which itself provides two other root scripts which configure either a cluster or a stand alone server:

[root@vm140 ~]# /u01/app/oraInventory/
Changing permissions of /u01/app/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.

Changing groupname of /u01/app/oraInventory to oinstall.
The execution of the script is complete.
[root@vm140 ~]# /u01/app/
Performing root user operation.

The following environment variables are set as:
	ORACLE_HOME=  /u01/app/

Enter the full pathname of the local bin directory: [/usr/local/bin]:
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...

Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.

To configure Grid Infrastructure for a Stand-Alone Server run the following command as the root user:
/u01/app/ -I/u01/app/ -I/u01/app/ /u01/app/

To configure Grid Infrastructure for a Cluster execute the following command as oracle user:
This command launches the Grid Infrastructure Configuration Wizard. The wizard also supports silent operation, and the parameters can be passed through the response file that is available in the installation media.

For the moment, we do not run any of these scripts.

2. Patching Grid Infrastructure software

Next step is to patch GI software to get the latest version for AFD. Simply update OPatch on all nodes and use “opatchauto” to patch GI home. You need to specify the ORACLE_HOME path using “-oh” parameter to patch an unconfigured Grid Infrastructure home.

[root@vm140 ~]# export ORACLE_HOME=/u01/app/
[root@vm140 ~]# export PATH=$ORACLE_HOME/OPatch:$PATH
[root@vm140 ~]# opatch version
OPatch Version:

OPatch succeeded.

[root@vm140 ~]# opatchauto apply /home/oracle/stage/22646084 -oh $ORACLE_HOME



Patching is completed successfully. Please find the summary as follows:

CRS Home:/u01/app/

==Following patches were SUCCESSFULLY applied:

Patch: /home/oracle/stage/22646084/21436941
Log: /u01/app/

Patch: /home/oracle/stage/22646084/22291127
Log: /u01/app/

Patch: /home/oracle/stage/22646084/22502518
Log: /u01/app/

Patch: /home/oracle/stage/22646084/22502555
Log: /u01/app/

OPatchAuto successful.

You see that with the latest OPatch version there is no need to create an ocm.rsp resopnse file anymore.

3. Configure Restart

Configure Restart? Why? Because it sets up everything we need to use AFD but does not need any shared storage or other cluster related things like virtual IPs, SCANs and so on.
Therefore you use the script that was provided earlier by the “” script. Do that on all nodes of the future cluster.

[root@vm140 ~]# /u01/app/ -I/u01/app/ -I/u01/app/ /u01/app/

4. Deconfigure Restart

After Restart was configured, you can deconfigure it right away. Everything that is needed for AFD is being kept. The documentation for that is here.

[root@vm140 ~]# cd /u01/app/
[root@vm140 install]# ./ -deconfig -force

5. Confiure ASM Filter Driver

Now you can finally start configuring AFD. The whitepaper from the MOS note mentioned at the beginning provides a good overview of what has to be done. Simply connect as “root”, set the environment and run the following:

[root@vm140 install]# $ORACLE_HOME/bin/asmcmd afd_configure
Connected to an idle instance.
AFD-627: AFD distribution files found.
AFD-636: Installing requested AFD software.
AFD-637: Loading installed AFD drivers.
AFD-9321: Creating udev for AFD.
AFD-9323: Creating module dependencies - this may take some time.
AFD-9154: Loading 'oracleafd.ko' driver.
AFD-649: Verifying AFD devices.
AFD-9156: Detecting control device '/dev/oracleafd/admin'.
AFD-638: AFD installation correctness verified.
Modifying resource dependencies - this may take some time.
ASMCMD-9524: AFD configuration failed 'ERROR: OHASD start failed'
[root@vm140 install]# $ORACLE_HOME/bin/asmcmd afd_state
Connected to an idle instance.
ASMCMD-9526: The AFD state is 'LOADED' and filtering is 'DISABLED' on host 'vm140'

Don’t care about the error and the message that is telling it failed. That is simply because there is no cluster at all at the moment.
As a final configuration step you need to set the discovery string for AFD so that it can find the disks you want to use. This is defined inside “/etc/afd.conf”:

[root@vm140 install]# cat /etc/afd.conf

The above steps need to be done on all servers of the future cluster.
Now that AFD is configured, you can start labeling your disks. Do this on only one node:

[root@vm140 install]# $ORACLE_HOME/bin/asmcmd afd_label GI /dev/xvdb1
Connected to an idle instance.
[root@vm140 install]# $ORACLE_HOME/bin/asmcmd afd_label DATA /dev/xvdc1
Connected to an idle instance.
[root@vm140 install]# $ORACLE_HOME/bin/asmcmd afd_label FRA /dev/xvdd1
Connected to an idle instance.

[root@vm140 install]# $ORACLE_HOME/bin/asmcmd afd_lsdsk
Connected to an idle instance.
Label                     Filtering   Path
GI                         DISABLED   /dev/xvdb1
DATA                       DISABLED   /dev/xvdc1
FRA                        DISABLED   /dev/xvdd1

On all the other nodes just do a rescan of the disks:

[root@vm141 install]# $ORACLE_HOME/bin/asmcmd afd_scan
Connected to an idle instance.
[root@vm141 install]# $ORACLE_HOME/bin/asmcmd afd_lsdsk
Connected to an idle instance.
Label                     Filtering   Path
GI                         DISABLED   /dev/xvdb1
DATA                       DISABLED   /dev/xvdc1
FRA                        DISABLED   /dev/xvdd1

That’s it.

6. Configure cluster with AFD

Finally, you can start configuring your new cluster and use AFD disks right from the beginning. You can now use the Cluster Configuration Assistant that was mentioned by “” to set up your cluster.

[oracle@vm140 ~]$ /u01/app/

Follow the steps and you will see the well-known screens for setting up a cluster. At the point when you define the initial Grid Inftrastructure diskgroup you can now specify the “Discovery String”:

And, voila, you see the previously labeled disks:

And after you run the root scripts on all nodes, you’ll get a running cluster:

[root@vm140 bin]# ./crsctl stat res -t
Name           Target  State        Server                   State details
Local Resources
			   ONLINE  ONLINE       vm140                    STABLE
			   ONLINE  ONLINE       vm141                    STABLE
			   ONLINE  ONLINE       vm142                    STABLE
			   ONLINE  ONLINE       vm143                    STABLE
			   ONLINE  ONLINE       vm140                    STABLE
			   ONLINE  ONLINE       vm141                    STABLE
			   ONLINE  ONLINE       vm142                    STABLE
			   OFFLINE OFFLINE      vm143                    STABLE
			   ONLINE  ONLINE       vm140                    STABLE
			   ONLINE  ONLINE       vm141                    STABLE
			   ONLINE  ONLINE       vm142                    STABLE
			   ONLINE  ONLINE       vm143                    STABLE
			   ONLINE  ONLINE       vm140                    STABLE
			   ONLINE  ONLINE       vm141                    STABLE
			   ONLINE  ONLINE       vm142                    STABLE
			   ONLINE  ONLINE       vm143                    STABLE
			   ONLINE  ONLINE       vm140                    STABLE
			   ONLINE  ONLINE       vm141                    STABLE
			   ONLINE  ONLINE       vm142                    STABLE
			   ONLINE  ONLINE       vm143                    STABLE
Cluster Resources
	  1        ONLINE  ONLINE       vm140                    STABLE
	  1        ONLINE  ONLINE       vm140           192.
	  1        ONLINE  ONLINE       vm140                    Started,STABLE
	  2        ONLINE  ONLINE       vm142                    Started,STABLE
	  3        ONLINE  ONLINE       vm141                    Started,STABLE
	  1        ONLINE  ONLINE       vm140                    STABLE
	  1        ONLINE  ONLINE       vm140                    Open,STABLE
	  1        ONLINE  ONLINE       vm140                    STABLE
	  1        ONLINE  ONLINE       vm140                    STABLE
	  1        ONLINE  ONLINE       vm140                    STABLE
	  1        ONLINE  ONLINE       vm141                    STABLE
	  1        ONLINE  ONLINE       vm142                    STABLE
	  1        ONLINE  ONLINE       vm143                    STABLE

And that’s it. Nothing more to do. Beside creating more disk groups and setting up databases. But that is simple compared to what we’ve done till now.

opatchauto Odyssey

A couple of days ago a customer asked for assistance in installing the January PSU in their RAC environment. The patch should be applied to two systems, first the test cluster, second the production cluster. Makes sense so far. So we planned the steps that needed to be done:

  • Download the patch
  • copy patch to all nodes and extract it
  • check OPatch version
  • create response file for OCM and copy it to all nodes
  • clear ASM adump directory since this may slow down pre-patch steps
  • “opatchauto” first node
  • “opatchauto” second node
  • run “datapatch” to apply SQL to databases

The whole procedure went fine without any issues on test. We even skipped the last step, running “datapatch” since the “opatchauto” did that for us. This happens in contrast to the Readme which does not tell about that.

So that was easy. But unfortunately the production system went not as smooth as the test system. “opatchauto” shut down the cluster stack and patched the RDBMS home successfully. But during the patch phase of GI, the logfile told us that there are still processes that blocked some files. I checked that and found a handful, one of those processes was the “ocssd”. When killing all the left-over processes I knew immediately that this was not the best idea. The server fenced and rebooted straight away. That left my cluster in a fuzzy state. The cluster stack came up again, but “opatchauto -resume” told me, that I should proceed with some manual steps. So I applied the patches to the GI home which was not done before and run the post-patch script which failed. Starting “opatchauto” in normal mode failed also since the cluster was already in “rolling” mode.

So finally I removed all the applied patches manually, put the cluster back in normal mode following MOS Note 1943498.1 and started the whole patching all over.  Everything went fine this time.


  1. Think before you act. Killing OCSSD is not a good idea at all.
  2. In contrast to the Readme “datapatch” is being executed by “opatchauto” as part of the patching process.
  3. Checking the current cluster status can be done like this:
[oracle@vm101 ~]$ crsctl query crs activeversion -f
Oracle Clusterware active version on the cluster is []. The cluster upgrade state is [NORMAL]. The cluster active patch level is [3467666221].


Shared Pool Memory Leak in 12c RAC

A couple of minutes ago I wrote about Parse Times in 12c. When investigating this issue we came across another one. The shared pool was growing and growing, stealing a lot of memory from the buffer cache. We stopped that by setting a minimum size for “db_cache_size” to stop shrinking the buffer cache. It is not that there isn’t enough memory, the MEMORY_TARGET was set to 42GB (the magic number). But the shared pool was consuming more than 20GB of that memory. We stopped the buffer cache from shrinking more and more at around 8GB. Now the shared pool started to steal it’s memory from the PGA.
So we started to identify what area is consuming the memory inside the shared pool:

SQL> select * from (
  2  select name, bytes from v$sgastat where pool ='shared pool' order by 2 desc
  3  )
  4  where rownum <11;

NAME                                      BYTES
-------------------------- --------------------
ges enqueues                      5.981.720.032
ges resource dynamic              5.568.662.280
value block free list   
ges resource dynamic              1.799.279.656
free memory                         983.839.992
gcs resources                       696.864.960
SQLA                                534.089.824
KGH: NO ACCESS                      399.729.056
gcs shadows                         387.147.200
KGLH0                               268.811.064

Ok, ges things… A quick search in My Oracle Support revealed a bug which exactly met our findings ORA-04031 Errors Occurring with High “ges resource dynamic” & “ges enqueues” Memory Usage In The Shared Pool.

The workaround mentioned in the document helped, now the shared pool is quiet again.