Turning a RAC into Data Guard

Turning a RAC into Data Guard

The headline sounds strange but this really happened to me some time ago. The reason
behind is that a customer got a new second datacenter some kilometers away from the
current one. That’s why they were forced to stretch the existing hardware over both
datacenters, with minimum invest. So we could either build a stretched RAC or something
else. During the discussion we came to the conclusion that a real desaster recovery (DR)
solution like Data Guard is much better than RAC for this setup. That created the
challenge to turn the existing 2-node-RAC into Data Guard. And yes, we are aware of the
gap in between when there is no RAC anymore and no Data Guard yet ūüôā
In the following post I’ll outline the steps that we went through, most of the command
output will be in german as it was a german project, but I think you’ll get the idea.

1. What we had

2-node RAC Version 12.1.0.2, configuration details

$ srvctl status database -db proddb
Instanz PRODDB_1 wird auf Knoten server-a ausgef√ľhrt
Instanz PRODDB_2 wird auf Knoten server-b ausgef√ľhrt
$ srvctl config database -db proddb
Eindeutiger Datenbankname: PRODDB
Datenbankname: PRODDB
Oracle-Standardverzeichnis: /u01/app/oracle/product/12.1.0.2/db
Oracle-Benutzer: oracle
Spfile: +ASMDAT/PRODDB/PARAMETERFILE/spfile.263.895424997
Kennwortdatei: +ASMDAT/PRODDB/PASSWORD/pwdproddb.258.895416527
Domain:
Startoptionen: open
Stoppoptionen: immediate
Datenbankrolle: PRIMARY
Verwaltungs-Policy: AUTOMATIC
Server-Pools: proddb
Datenträgergruppen: ASMDAT,ASMLOG1,ASMLOG2
Mount Point-Pfade:
Services:
Typ: RAC
Parallelität starten:
Parallelität stoppen:
OSDBA-Gruppe: oinstall
OSOPER-Gruppe:
Datenbankinstanzen: PRODDB_1,PRODDB_2
Konfigurierte Knoten:
Datenbank wird von Policy verwaltet

2. Remove node B from the cluster

First we have to stop the database instance on that node:

$ srvctl stop instance -db proddb -node server-b

It is a policy-managed cluster, so we moved the node to the Free-pool

$ srvctl relocate server -n server-b -g Free

Now that is the status of the server pools

$ srvctl status serverpool  -detail
Server-Pool-Name: Free
Anzahl der aktiven Server: 1
Namen der aktiven Server: server-b
NAME=server-b STATE=ONLINE
Server-Pool-Name: Generic
Anzahl der aktiven Server: 0
Namen der aktiven Server:
Server-Pool-Name: proddb
Anzahl der aktiven Server: 1
Namen der aktiven Server: server-a
NAME=server-a STATE=ONLINE

At this point we can remove node B from the cluster configuration following the documentation

Remove node A from the oraInventory on node B

$ /u01/app/grid/12.1.0.2/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES={server-b}" CRS=TRUE -silent -local
Oracle Universal Installer wird gestartet...

Swap-Speicher wird gepr√ľft: muss gr√∂√üer sein als 500 MB.   Tats√§chlich 60145 MB    Erfolgreich
The inventory pointer is located at /var/opt/oracle/oraInst.loc

[...]

Once this is done we can deconfigure Oracle Clusterware Stack on node B according to the documentation.

As root:

$ cd $ORACLE_HOME/crs/install/
$ ./rootcrs.sh -deconfig -force
Using configuration parameter file: /u01/app/grid/12.1.0.2/crs/install/crsconfig_params
Netzwerk 1 ist vorhanden
Subnet IPv4: 172.17.3.0/255.255.255.0/ipmpfs0, static
Subnet IPv6:

[...]

2017/09/06 09:46:21 CLSRSC-4006: Removing Oracle Trace File Analyzer (TFA) Collector.

2017/09/06 09:47:05 CLSRSC-4007: Successfully removed Oracle Trace File Analyzer (TFA) Collector.

2017/09/06 09:47:11 CLSRSC-336: Successfully deconfigured Oracle Clusterware stack on this node

Remove node B from the oraInventory on node A

cd $ORACLE_HOME/oui/bin/
./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES={server-a}" CRS=TRUE -silent -local
Oracle Universal Installer wird gestartet...

Swap-Speicher wird gepr√ľft: muss gr√∂√üer sein als 500 MB.   Tats√§chlich 36559 MB    Erfolgreich
The inventory pointer is located at /var/opt/oracle/oraInst.loc
'UpdateNodeList' war erfolgreich.

Finally, remove node B from Grid Infrastructure (as root)

crsctl delete node -n server-b
CRS-4661: Node server-b successfully deleted.

3. Turn Cluster Grid Infrastructure into Single Instance

Grid Infrastructure (GI) and Single Instance (SI) installations are nearly the same. So instead of
deinstalling GI and reinstalling (and patching) SI, we can simply change the installation type.

To do this, we first deconfigured GI

$ cd $ORACLE_HOME/crs/install
$ ./rootcrs.sh -deconfig -force
Using configuration parameter file: /u01/app/grid/12.1.0.2/crs/install/crsconfig_params
Netzwerk 1 ist vorhanden
Subnet IPv4: 172.17.3.0/255.255.255.0/ipmpfs0, static
Subnet IPv6:
Ziele pingen:
Netzwerk ist aktiviert

[…]

CRS-4133: Oracle High Availability Services has been stopped.
2017/09/19 12:20:22 CLSRSC-4006: Removing Oracle Trace File Analyzer (TFA) Collector.

2017/09/19 12:20:54 CLSRSC-4007: Successfully removed Oracle Trace File Analyzer (TFA) Collector.

2017/09/19 12:21:00 CLSRSC-336: Successfully deconfigured Oracle Clusterware stack on this node

GI uses some special root permissions for its installation, so we needed to revert the permission settings.

$ ./rootcrs.sh -unlock -crshome $ORACLE_HOME ‚Äďnocrsstop

There is a bug when configuring SI when there are left-over files from a previous GI installation,
so we got rid of these.

$ cd /u01/app/oracle/
$ ls
admin          cfgtoollogs    crsdata        diagsnap       log
audit          checkpoints    diag           server-a       product
$ mv crsdata crsdata_old
$ cd -
/u01/app/grid/12.1.0.2/crs/install

A GI installation is linked with the RAC option enabled. With SI we do not need this and beside that, it will
fail to configure SI. So we disabled the RAC option.

$ cd $ORACLE_HOME/rdbms/lib
$ make -f ins_rdbms.mk rac_off ioracle

Now everything is prepared to configure SI (as root):

$ /u01/app/grid/12.1.0.2/perl/bin/perl -I/u01/app/grid/12.1.0.2/perl/lib -I/u01/app/grid/12.1.0.2/crs/install /u01/app/grid/12.1.0.2/crs/install/roothas.pl
Using configuration parameter file: /u01/app/grid/12.1.0.2/crs/install/crsconfig_params
2017/09/19 12:29:06 CLSRSC-363: User ignored prerequisites during installation

LOCAL ADD MODE
Creating OCR keys for user 'oracle', privgrp 'oinstall'..
Operation successful.
LOCAL ONLY MODE
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4664: Knoten server-b erfolgreich gepinnt.
2017/09/19 12:29:17 CLSRSC-330: Adding Clusterware entries to file '/etc/inittab'


server-b     2017/09/19 12:29:58     /u01/app/grid/12.1.0.2/cdata/server-b/backup_20170919_122958.olr     257602229
CRS-2791: Herunterfahren der von Oracle High Availability Services verwalteten Ressourcen auf "server-b" wird gestartet
CRS-2673: Versuch, "ora.evmd" auf "server-b" zu stoppen
CRS-2677: Stoppen von "ora.evmd" auf "server-b" erfolgreich
CRS-2793: Herunterfahren der von Oracle High Availability Services verwalteten Ressourcen auf "server-b" ist abgeschlossen
CRS-4133: Oracle High Availability Services wurde gestoppt
CRS-4123: Oracle High Availability Services wurde gestartet.
2017/09/19 12:30:36 CLSRSC-327: Successfully configured Oracle Restart for a standalone server

Configure SI

Having SI up and running, we need to configure some more things, namely ASM and a listener.

$ srvctl add asm
$ srvctl start asm
$ srvctl add listener
$ srvctl start listener
$ crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       server-b                 STABLE
ora.asm
               ONLINE  ONLINE       server-b                 Started,STABLE
ora.ons 
               OFFLINE OFFLINE      server-b                 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
      1        ONLINE  ONLINE       server-b                 STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.evmd
      1        ONLINE  ONLINE       server-b                 STABLE
--------------------------------------------------------------------------------

As per this setup, ASM comes without any configurtion files. That’s why we need to create the SPfile and password file on our own.
This is especially important for the following disk setup, we set the discovery string to be able to find our disks.

Setting up SPfile:

SQL> create spfile from memory;

File created.

$ srvctl modify asm -d '/dev/rdsk/asm*'
$ srvctl config asm
ASM-Standardverzeichnis: <CRS home>
Kennwortdatei:
ASM-Listener: LISTENER
Spfile: ?/dbs/spfile@.ora
ASM-Datenträgergruppe-Discovery-Zeichenfolge: /dev/rdsk/asm*
$ srvctl stop asm
$ srvctl start asm

Setting up ASM password file:

ASMCMD> pwcreate --asm +ASMDAT <SYS password>
ASMCMD> lspwusr
Username sysdba sysoper sysasm
     SYS   TRUE    TRUE  FALSE
ASMCMD>
ASMCMD> orapwusr --add asmsnmp
Enter password: ********
ASMCMD> lspwusr
Username sysdba sysoper sysasm
     SYS   TRUE    TRUE  FALSE
 ASMSNMP  FALSE   FALSE  FALSE
ASMCMD> orapwusr --grant sysasm asmsnmp
ASMCMD> orapwusr --grant sysasm sys
ASMCMD> lspwusr
Username sysdba sysoper sysasm
     SYS   TRUE    TRUE   TRUE
 ASMSNMP  FALSE   FALSE   TRUE

For the moved node B we created new LUNs for the database. We can’t use the old LUNs since they hold the existing database.
The new Data Guard setup requires separate storage for the database.

SQL> alter diskgroup asmdat mount;

Diskgroup altered.

SQL> alter diskgroup asmrec mount;

Diskgroup altered.

SQL> alter diskgroup asmlog1 mount;

Diskgroup altered.

SQL> alter diskgroup asmlog2 mount;

Diskgroup altered.


$ asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576    274408    36703                0           36703              0             N  ASMDAT/
MOUNTED  EXTERN  N         512   4096  1048576     10216     2893                0            2893              0             N  ASMLOG1/
MOUNTED  EXTERN  N         512   4096  1048576     10216     2893                0            2893              0             N  ASMLOG2/
MOUNTED  EXTERN  N         512   4096  1048576    102376    97502                0           97502              0             N  ASMREC/

5. Turn database installation into SI

Changing the database installation is pretty easy. We have to change to Oracle Inventory and we have to
disable the RAC option as we did in the GI installation.

First, we disable the RAC option for the database home:

$ cd $ORACLE_HOME/rdbms/lib/
$ make -f ins_rdbms.mk rac_off ioracle

Then we modify the inventory to get a single server installation:

$ cd $ORACLE_HOME/oui/bin
$ ./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES={server-b}" -silent -local
Oracle Universal Installer wird gestartet...

Swap-Speicher wird gepr√ľft: muss gr√∂√üer sein als 500 MB.   Tats√§chlich 61662 MB    Erfolgreich
The inventory pointer is located at /var/opt/oracle/oraInst.loc
'UpdateNodeList' war erfolgreich

6. Setting up Data Guard

I will not cover this in detail since this is a different topic. Just to outline it, we did a “duplicate for standby”, registered the duplicate DB with SI, set up Data Guard between the existing DB and the duplicate and enable the configuration.

7. Modify server A

Changing server A from GI to SI requires pretty much the same steps that we did to server B. We just did a Data Guard Switchover
to server B, disabled and dropped the Data Guard configuration and went through steps 1-6 on server A. That’s basically it.

8. Conclusion

Finally we ended up with a clean Single Instance installation on both servers, with single instance databases and a neat Data
Guard set up between them.

Advertisements

DataPatch stuck on RAC – PSU October 2016

Yesterday one of my customers wanted to patch two 2-node clusters with the current PSU October 2016 (161018). Both are running 12.1.0.2 Grid Infrastructure and 12.1.0.2 Database. The servers run SPARC Solaris 10. When applying the patch on the first cluster using “opatchauto” everything went fine until the “trying to apply SQL Patch” part on the 2nd node. So I went to the log directory and found the following:

$ cd $ORACLE_BASE/cfgtoollogs/sqlpatch/sqlpatch_27075_2016_11_30_17_12_08
$ tail sqlpatch_catcon_0.log

SQL> GRANT SELECT ON sys.gv_$instance TO dv_secanalyst
  2  /

At that line it was stuck. Searching My Oracle Support brought up nothing helpful. So I had a look at the database sessions:

SQL> select sid, username, event, state, seconds_in_wait 
2    from v$session where username='SYS';

       SID USERNAME                       EVENT                                                            STATE                                                   SECONDS_IN_WAIT
---------- ------------------------------ ---------------------------------------------------------------- -----------                                    -------- ---------------
        13 SYS                            SQL*Net message from client                                      WAITING                                                             226
        30 SYS                            SQL*Net message from client                                      WAITING                                                             473
        32 SYS                            SQL*Net message to client                                        WAITED SHOR                                    T TIME                 0
       411 SYS                            SQL*Net message from client                                      WAITING                                                             473
       783 SYS                            library cache lock                                               WAITING                                                             211
       786 SYS                            SQL*Net message from client                                      WAITING                                                               4
      1155 SYS                            SQL*Net message from client                                      WAITING                                                             467

The session is waiting for something dictionary related. Since the waiting statement was related to RAC, I stopped the other instance which made sqlplatch continue immediately. So the workaround looked like this:

$ srvctl stop instance -db <dbname> -node <node1>
$ srvctl start instance -db <dbname> -node <node1>

This happened on both clusters. So be aware of that in case you are applying that PSU patch to RAC databases.
In case you missed to stop the 1st instance in time, the GRANT statement will run into a timeout (ORA-4021) and the SQL-Patch will be marked with “ERROR” in DBA_REGISTRY_SQLPATCH. In such case, just re-run “datapatch” again and monitor the logfile.
Happy patching.

Update 07-DEC-2016

I was not able to reproduce this issue on a Linux x86-64 system. So there is a chance that the issue is OS related.

Update 12-DEC-2016

Finally I reproduced this issue on my Linux x86-64 test system. Now I opened a SR for that.

Update 13-DEC-2016

Thanks to a quick and efficient Oracle Support guy (yes, there are such people!) we found the root cause of that issue. There is a bug in the Enterprise Manager Agent (DB Express maybe too) that it holds a shared lock on some GV$ views during the whole lifetime of a session. That’s why datapatch got stuck. If you just stop the Agent, datapatch will continue immediatly. There is no need to stop the whole instance. We just need to get rid of the Agent’s sessions.
Thanks a lot to Prakash from Oracle Support for his engagement in investigating this issue.

Using ASM Filter Driver right from the beginning

Preface

If you are running Oracle RAC, then configuring the shared storage is one of the main preinstallation tasks that needs to be done. You need to configure multipathing and make sure that the device name that will be used for ASM is always the same. And you must set permissions and ownership for these devices. On Linux you can use ASMlib for that. It stamps the devices so that it can identify them, provides an unique and consistent name for ASM and sets propper permissions for the devices. But it still possible for other processes to write to these devices, using “dd” for instance.

Now there is Oracle Grid Infrastructure 12c which introduces a replacement for ASMlib called ASM Filter Driver (AFD). Basically it does the same things as ASMlib but in addition to that it is able¬†to block¬†write operations from other processes than Oracle’s own ones.

So that is a good thing and I wanted to use it for a new cluster that I should set up. And that is where the trouble starts. Beside the fact that there were some bugs in the initial versions of AFD from which most got fixed by the April 2016 PSU, AFD is installed as part of Grid Infrastructure. You can read that in the Automatic Storage Management Docs. It states the following:

After installation of Oracle Grid Infrastructure, you can optionally configure Oracle ASMFD for your system.

What? After installation? But I need it right from the beginning to use it for my initial disk group. How about that? There is a MOS note How to Install ASM Filter Driver in a Linux Environment Without Having Previously Installed ASMLIB (Doc ID 2060259.1)  but this Whitepaper also asumes that Grid Infrastructure is already installed.

But as you can read from this blog posts title, there is a way to use AFD from scratch, but it is not really straight forward.

1. Install Grid Infrastructure Software

First step is to install Grid Infrastructure as a software only installation. That implies that you have to do it on all nodes that should form the future cluster. I did that on the first node, saved the response file and did a silent install on the other nodes.

[oracle@vm140 ~] ./runInstaller -silent -responseFile /home/oracle/stage/grid/grid.rsp -ignorePrereq

At the end of the installation you need to run the “orainstRoot.sh” script which itself provides two other root scripts which configure either a cluster or a stand alone server:

[root@vm140 ~]# /u01/app/oraInventory/orainstRoot.sh
Changing permissions of /u01/app/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.

Changing groupname of /u01/app/oraInventory to oinstall.
The execution of the script is complete.
[root@vm140 ~]# /u01/app/12.1.0.2/grid/root.sh
Performing root user operation.

The following environment variables are set as:
	ORACLE_OWNER= oracle
	ORACLE_HOME=  /u01/app/12.1.0.2/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...


Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.

To configure Grid Infrastructure for a Stand-Alone Server run the following command as the root user:
/u01/app/12.1.0.2/grid/perl/bin/perl -I/u01/app/12.1.0.2/grid/perl/lib -I/u01/app/12.1.0.2/grid/crs/install /u01/app/12.1.0.2/grid/crs/install/roothas.pl


To configure Grid Infrastructure for a Cluster execute the following command as oracle user:
/u01/app/12.1.0.2/grid/crs/config/config.sh
This command launches the Grid Infrastructure Configuration Wizard. The wizard also supports silent operation, and the parameters can be passed through the response file that is available in the installation media.

For the moment, we do not run any of these scripts.

2. Patching Grid Infrastructure software

Next step is to patch GI software to get the latest version for AFD. Simply update OPatch on all nodes and use “opatchauto” to patch GI home. You need to specify the ORACLE_HOME path using “-oh” parameter to patch an unconfigured Grid Infrastructure home.

[root@vm140 ~]# export ORACLE_HOME=/u01/app/12.1.0.2/grid
[root@vm140 ~]# export PATH=$ORACLE_HOME/OPatch:$PATH
[root@vm140 ~]# opatch version
OPatch Version: 12.1.0.1.12

OPatch succeeded.

[root@vm140 ~]# opatchauto apply /home/oracle/stage/22646084 -oh $ORACLE_HOME

[...]

--------------------------------Summary--------------------------------

Patching is completed successfully. Please find the summary as follows:

Host:vm140
CRS Home:/u01/app/12.1.0.2/grid
Summary:

==Following patches were SUCCESSFULLY applied:

Patch: /home/oracle/stage/22646084/21436941
Log: /u01/app/12.1.0.2/grid/cfgtoollogs/opatchauto/core/opatch/opatch2016-06-10_13-54-11PM_1.log

Patch: /home/oracle/stage/22646084/22291127
Log: /u01/app/12.1.0.2/grid/cfgtoollogs/opatchauto/core/opatch/opatch2016-06-10_13-54-11PM_1.log

Patch: /home/oracle/stage/22646084/22502518
Log: /u01/app/12.1.0.2/grid/cfgtoollogs/opatchauto/core/opatch/opatch2016-06-10_13-54-11PM_1.log

Patch: /home/oracle/stage/22646084/22502555
Log: /u01/app/12.1.0.2/grid/cfgtoollogs/opatchauto/core/opatch/opatch2016-06-10_13-54-11PM_1.log


OPatchAuto successful.

You see that with the latest OPatch version there is no need to create an ocm.rsp resopnse file anymore.

3. Configure Restart

Configure Restart? Why? Because it sets up everything we need to use AFD but does not need any shared storage or other cluster related things like virtual IPs, SCANs and so on.
Therefore you use the script that was provided earlier by the “orainstRoot.sh” script. Do that on all nodes of the future cluster.

[root@vm140 ~]# /u01/app/12.1.0.2/grid/perl/bin/perl -I/u01/app/12.1.0.2/grid/perl/lib -I/u01/app/12.1.0.2/grid/crs/install /u01/app/12.1.0.2/grid/crs/install/roothas.pl

4. Deconfigure Restart

After Restart was configured, you can deconfigure it right away. Everything that is needed for AFD is being kept. The documentation for that is here.

[root@vm140 ~]# cd /u01/app/12.1.0.2/grid/crs/install/
[root@vm140 install]# ./roothas.sh -deconfig -force

5. Confiure ASM Filter Driver

Now you can finally start configuring AFD. The whitepaper from the MOS note mentioned at the beginning provides a good overview of what has to be done. Simply connect as “root”, set the environment and run the following:

[root@vm140 install]# $ORACLE_HOME/bin/asmcmd afd_configure
Connected to an idle instance.
AFD-627: AFD distribution files found.
AFD-636: Installing requested AFD software.
AFD-637: Loading installed AFD drivers.
AFD-9321: Creating udev for AFD.
AFD-9323: Creating module dependencies - this may take some time.
AFD-9154: Loading 'oracleafd.ko' driver.
AFD-649: Verifying AFD devices.
AFD-9156: Detecting control device '/dev/oracleafd/admin'.
AFD-638: AFD installation correctness verified.
Modifying resource dependencies - this may take some time.
ASMCMD-9524: AFD configuration failed 'ERROR: OHASD start failed'
[root@vm140 install]# $ORACLE_HOME/bin/asmcmd afd_state
Connected to an idle instance.
ASMCMD-9526: The AFD state is 'LOADED' and filtering is 'DISABLED' on host 'vm140'

Don’t care about the error and the message that is telling it failed. That is simply because there is no cluster at all at the moment.
As a final configuration step you need to set the discovery string for AFD so that it can find the disks you want to use. This is defined inside “/etc/afd.conf”:

[root@vm140 install]# cat /etc/afd.conf
afd_diskstring='/dev/xvd*'

The above steps need to be done on all servers of the future cluster.
Now that AFD is configured, you can start labeling your disks. Do this on only one node:

[root@vm140 install]# $ORACLE_HOME/bin/asmcmd afd_label GI /dev/xvdb1
Connected to an idle instance.
[root@vm140 install]# $ORACLE_HOME/bin/asmcmd afd_label DATA /dev/xvdc1
Connected to an idle instance.
[root@vm140 install]# $ORACLE_HOME/bin/asmcmd afd_label FRA /dev/xvdd1
Connected to an idle instance.

[root@vm140 install]# $ORACLE_HOME/bin/asmcmd afd_lsdsk
Connected to an idle instance.
--------------------------------------------------------------------------------
Label                     Filtering   Path
================================================================================
GI                         DISABLED   /dev/xvdb1
DATA                       DISABLED   /dev/xvdc1
FRA                        DISABLED   /dev/xvdd1

On all the other nodes just do a rescan of the disks:

[root@vm141 install]# $ORACLE_HOME/bin/asmcmd afd_scan
Connected to an idle instance.
[root@vm141 install]# $ORACLE_HOME/bin/asmcmd afd_lsdsk
Connected to an idle instance.
--------------------------------------------------------------------------------
Label                     Filtering   Path
================================================================================
GI                         DISABLED   /dev/xvdb1
DATA                       DISABLED   /dev/xvdc1
FRA                        DISABLED   /dev/xvdd1

That’s it.

6. Configure cluster with AFD

Finally, you can start configuring your new cluster and use AFD disks right from the beginning. You can now use the Cluster Configuration Assistant that was mentioned by “orainstRoot.sh” to set up your cluster.

[oracle@vm140 ~]$ /u01/app/12.1.0.2/grid/crs/config/config.sh

Follow the steps and you will see the well-known screens for setting up a cluster. At the point when you define the initial Grid Inftrastructure diskgroup you can now specify the “Discovery String”:

And, voila, you see the previously labeled disks:

And after you run the root scripts on all nodes, you’ll get a running cluster:

[root@vm140 bin]# ./crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
			   ONLINE  ONLINE       vm140                    STABLE
			   ONLINE  ONLINE       vm141                    STABLE
			   ONLINE  ONLINE       vm142                    STABLE
			   ONLINE  ONLINE       vm143                    STABLE
ora.GI.dg
			   ONLINE  ONLINE       vm140                    STABLE
			   ONLINE  ONLINE       vm141                    STABLE
			   ONLINE  ONLINE       vm142                    STABLE
			   OFFLINE OFFLINE      vm143                    STABLE
ora.LISTENER.lsnr
			   ONLINE  ONLINE       vm140                    STABLE
			   ONLINE  ONLINE       vm141                    STABLE
			   ONLINE  ONLINE       vm142                    STABLE
			   ONLINE  ONLINE       vm143                    STABLE
ora.net1.network
			   ONLINE  ONLINE       vm140                    STABLE
			   ONLINE  ONLINE       vm141                    STABLE
			   ONLINE  ONLINE       vm142                    STABLE
			   ONLINE  ONLINE       vm143                    STABLE
ora.ons
			   ONLINE  ONLINE       vm140                    STABLE
			   ONLINE  ONLINE       vm141                    STABLE
			   ONLINE  ONLINE       vm142                    STABLE
			   ONLINE  ONLINE       vm143                    STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
	  1        ONLINE  ONLINE       vm140                    STABLE
ora.MGMTLSNR
	  1        ONLINE  ONLINE       vm140                    169.254.231.166 192.
															 168.1.1,STABLE
ora.asm
	  1        ONLINE  ONLINE       vm140                    Started,STABLE
	  2        ONLINE  ONLINE       vm142                    Started,STABLE
	  3        ONLINE  ONLINE       vm141                    Started,STABLE
ora.cvu
	  1        ONLINE  ONLINE       vm140                    STABLE
ora.mgmtdb
	  1        ONLINE  ONLINE       vm140                    Open,STABLE
ora.oc4j
	  1        ONLINE  ONLINE       vm140                    STABLE
ora.scan1.vip
	  1        ONLINE  ONLINE       vm140                    STABLE
ora.vm140.vip
	  1        ONLINE  ONLINE       vm140                    STABLE
ora.vm141.vip
	  1        ONLINE  ONLINE       vm141                    STABLE
ora.vm142.vip
	  1        ONLINE  ONLINE       vm142                    STABLE
ora.vm143.vip
	  1        ONLINE  ONLINE       vm143                    STABLE
--------------------------------------------------------------------------------

And that’s it. Nothing more to do. Beside creating more disk groups and setting up databases. But that is simple compared to what we’ve done till now.

opatchauto Odyssey

A couple of days ago a customer asked for assistance in installing the January PSU in their RAC environment. The patch should be applied to two systems, first the test cluster, second the production cluster. Makes sense so far. So we planned the steps that needed to be done:

  • Download the patch
  • copy patch to all nodes and extract it
  • check OPatch version
  • create response file for OCM and copy it to all nodes
  • clear ASM adump directory since this may slow down pre-patch steps
  • “opatchauto” first node
  • “opatchauto” second node
  • run “datapatch” to apply SQL to databases

The whole procedure went fine without any issues on test. We even skipped the last step, running “datapatch” since the “opatchauto” did that for us. This happens in contrast to the Readme which does not tell about that.

So that was easy. But unfortunately the production system went not as smooth as the test system. “opatchauto” shut down the cluster stack and patched the RDBMS home successfully. But during the patch phase of GI, the logfile told us that there are still processes that blocked some files. I checked that and found a handful, one of those processes was the “ocssd”. When killing all the left-over processes I knew immediately that this was not the best idea. The server fenced and rebooted straight away. That left my cluster in a fuzzy state. The cluster stack came up again, but “opatchauto -resume” told me, that I should proceed with some manual steps. So I applied the patches to the GI home which was not done before and run the post-patch script which failed. Starting “opatchauto” in normal mode failed also since the cluster was already in “rolling” mode.

So finally I removed all the applied patches manually, put the cluster back in normal mode following MOS Note 1943498.1 and started the whole patching all over.  Everything went fine this time.

Conclusion

  1. Think before you act. Killing OCSSD is not a good idea at all.
  2. In contrast to the Readme “datapatch” is being executed by “opatchauto” as part of the patching process.
  3. Checking the current cluster status can be done like this:
[oracle@vm101 ~]$ crsctl query crs activeversion -f
Oracle Clusterware active version on the cluster is [12.1.0.2.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [3467666221].

 

Shared Pool Memory Leak in 12c RAC

A couple of minutes ago I wrote about Parse Times in 12c. When investigating this issue we came across another one. The shared pool was growing and growing, stealing a lot of memory from the buffer cache. We stopped that by setting a minimum size for “db_cache_size” to stop shrinking the buffer cache. It is not that there isn’t enough memory, the MEMORY_TARGET was set to 42GB (the magic number). But the shared pool was consuming more than 20GB of that memory. We stopped the buffer cache from shrinking more and more at around 8GB. Now the shared pool started to steal it’s memory from the PGA.
So we started to identify what area is consuming the memory inside the shared pool:

SQL> select * from (
  2  select name, bytes from v$sgastat where pool ='shared pool' order by 2 desc
  3  )
  4  where rownum <11;

NAME                                      BYTES
-------------------------- --------------------
ges enqueues                      5.981.720.032
ges resource dynamic              5.568.662.280
value block free list             2.283.277.296
ges resource dynamic              1.799.279.656
free memory                         983.839.992
gcs resources                       696.864.960
SQLA                                534.089.824
KGH: NO ACCESS                      399.729.056
gcs shadows                         387.147.200
KGLH0                               268.811.064

Ok, ges things… A quick search in My Oracle Support revealed a bug which exactly met our findings ORA-04031 Errors Occurring with High “ges resource dynamic” & “ges enqueues” Memory Usage In The Shared Pool.

The workaround mentioned in the document helped, now the shared pool is quiet again.