ora.storage hangs after node reboot

A couple of days ago I experienced a node reboot in a Windows RAC cluster at a customers site. Beside the fact, that we could not identify any root cause for the reboot, there were no messages in the clusters alert.log, the ocssd.log or crsd.log, the clusterware did not start properly after the node came up again.
The clusterware is 12.1.0.2 with the April 2017 BundlePatch installed.
When checking the progress of the startup via “crsctl stat res -t -init” we could see that it was stuck at starting “ora.storage” resource. After 10 minutes of waiting it simply timed out.

So I checked the cluster alert.log first:

2017-12-05 14:17:52.417 [ORAROOTAGENT(1960)]CRS-5019: All OCR locations are on ASM disk groups [OCR], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "D:\app\oracle\diag\crs\server001\crs\trace\ohasd_orarootagent_system.trc".
2017-12-05 14:27:40.137000 +01:00
2017-12-05 14:27:40.137 [ORAROOTAGENT(3960)]CRS-5818: Aborted command 'start' for resource 'ora.storage'. Details at (:CRSAGF00113:) {0:9:4} in D:\app\oracle\diag\crs\server001\crs\trace\ohasd_orarootagent_system.trc.
2017-12-05 14:27:40.823 [ORAROOTAGENT(1960)]CRS-5017: The resource action "ora.storage start" encountered the following error:
2017-12-05 14:27:40.823+Storage agent start action aborted. For details refer to "(:CLSN00107:)" in "D:\app\oracle\diag\crs\server001\crs\trace\ohasd_orarootagent_system.trc".
2017-12-05 14:27:44.158000 +01:00
2017-12-05 14:27:44.158 [OHASD(1448)]CRS-2757: Command 'Start' timed out waiting for response from the resource 'ora.storage'. Details at (:CRSPE00163:) {0:9:4} in D:\app\oracle\diag\crs\server001\crs\trace\ohasd.trc.
2017-12-05 14:28:53.847000 +01:00

Next, I went to the mentioned “ohasd_orarootagent_system.trc” and found many many lines like these:

2017-12-05 14:26:48.471105*:kgfn.c@6356: kgfnGetNodeType: flags=0x10
2017-12-05 14:26:48.471105*:kgfn.c@6369: kgfnGetNodeType: ntyp=1
2017-12-05 14:26:48.471105*:kgfn.c@4644: kgfnConnect2: kgfnGetBeqData failed
2017-12-05 14:26:48.471105*:kgfn.c@4680: kgfnConnect: srvr valid
2017-12-05 14:26:48.471105*:kgfn.c@5972: kgfnConnect2Int: sysasm=0 envflags=0x10 srvrflags=0x1 unam=crsuser__asm_001 password is NOT NULL pstr=_ocr
2017-12-05 14:26:48.471105*:kgfn.c@6121: kgfnConnect2Int: hosts=1
2017-12-05 14:26:48.471105*:kgfn.c@6134: kgfnConnect2Int: cstr=(DESCRIPTION=(TRANSPORT_CONNECT_TIMEOUT=60)(EXPIRE_TIME=1)(LOAD_BALANCE=ON)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.50.35)(PORT=1522)))(CONNECT_DATA=(SERVICE_NAME=+ASM)))
2017-12-05 14:26:48.471105*:kgfn.c@6144: kgfnConnect2Int: OCIServerAttach failed
2017-12-05 14:26:48.471105*:kgfn.c@1602: kgfnRecordErrPriv: status=-1  at kgfn.c:6300
2017-12-05 14:26:48.471105*:kgfn.c@1648: kgfnRecordErrPriv: 12541 error=ORA-12541: TNS:no listener

Look at the highlighted lines, the process ist attempting to contact the ASM instance of the surviving node. 192.168.50.35 is the IP of the interconnect network of the surving node. The interconnect IP of the rebooting node is 192.168.50.33. So the connection string is correct at the first glance. But why does it return “TNS-12541: No listener”? I checked the ASM listener on the other node:

C:\>lsnrctl status ASMNET1LSNR_ASM

LSNRCTL for 64-bit Windows: Version 12.1.0.2.0 - Production on 05-DEC-2017 14:32:47

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=ASMNET1LSNR_ASM)))
STATUS of the LISTENER
------------------------
Alias                     ASMNET1LSNR_ASM
Version                   TNSLSNR for 64-bit Windows: Version 12.1.0.2.0 - Production
Start Date                28-OCT-2017 12:25:58
Uptime                    38 days 3 hr. 6 min. 53 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   D:\app\grid\12.1.0.2\network\admin\listener.ora
Listener Log File         D:\app\oracle\diag\tnslsnr\server002\asmnet1lsnr_asm\alert\log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=\\.\pipe\ASMNET1LSNR_ASMipc)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.50.35)(PORT=1522)))
Services Summary...
Service "+ASM" has 2 instance(s).
  Instance "+asm1", status READY, has 1 handler(s) for this service...
  Instance "+asm2", status READY, has 2 handler(s) for this service...
The command completed successfully

As you see, the listeners address and port are correct. But why are there 2 instances? There should be only 1 for “+asm2” which is the instance of the surving node. Let’s double-check that:

C:\>lsnrctl services ASMNET1LSNR_ASM

LSNRCTL for 64-bit Windows: Version 12.1.0.2.0 - Production on 05-DEC-2017 14:33:15

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=ASMNET1LSNR_ASM)))
Services Summary...
Service "+ASM" has 2 instance(s).
  Instance "+asm1", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:473 refused:0 state:ready
         REMOTE SERVER
         (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.50.33)(PORT=1522)))       Instance "+asm2", status READY, has 2 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:0 refused:0 state:ready
         LOCAL SERVER
      "DEDICATED" established:0 refused:0 state:ready
         REMOTE SERVER
         (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.50.35)(PORT=1522)))
The command completed successfully

There is still a service handler for “+asm1” instance on the rebooted node pointing its interconnect IP where the ASM listener was running half an hour ago. So obviously the ASM listener redirects the connect request to that service handler which is not running any more. That explains the “No listener” error message in the log.

My workaround was to simply reload the ASM listener configuration on the surving node:

C:\>lsnrctl reload ASMNET1LSNR_ASM

LSNRCTL for 64-bit Windows: Version 12.1.0.2.0 - Production on 05-DEC-2017 14:34:04

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=ASMNET1LSNR_ASM)))
The command completed successfully

C:\Users\syszwiorap\Desktop>lsnrctl services ASMNET1LSNR_ASM

LSNRCTL for 64-bit Windows: Version 12.1.0.2.0 - Production on 05-DEC-2017 14:38:41

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=ASMNET1LSNR_ASM)))
Services Summary...
Service "+ASM" has 1 instance(s).
  Instance "+asm2", status READY, has 2 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:0 refused:0 state:ready
         REMOTE SERVER
         (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.50.35)(PORT=1522)))
      "DEDICATED" established:5 refused:0 state:ready
         LOCAL SERVER
The command completed successfully

And that was basically it. The “ora.storage” resource started right away.

I asume, this is an unexpected bahviour. The listener should drop the service handler after a specific timeout when it does not get service updates anymore to prevent such situations.

Advertisements

Turning a RAC into Data Guard

Turning a RAC into Data Guard

The headline sounds strange but this really happened to me some time ago. The reason
behind is that a customer got a new second datacenter some kilometers away from the
current one. That’s why they were forced to stretch the existing hardware over both
datacenters, with minimum invest. So we could either build a stretched RAC or something
else. During the discussion we came to the conclusion that a real desaster recovery (DR)
solution like Data Guard is much better than RAC for this setup. That created the
challenge to turn the existing 2-node-RAC into Data Guard. And yes, we are aware of the
gap in between when there is no RAC anymore and no Data Guard yet 🙂
In the following post I’ll outline the steps that we went through, most of the command
output will be in german as it was a german project, but I think you’ll get the idea.

1. What we had

2-node RAC Version 12.1.0.2, configuration details

$ srvctl status database -db proddb
Instanz PRODDB_1 wird auf Knoten server-a ausgefĂĽhrt
Instanz PRODDB_2 wird auf Knoten server-b ausgefĂĽhrt
$ srvctl config database -db proddb
Eindeutiger Datenbankname: PRODDB
Datenbankname: PRODDB
Oracle-Standardverzeichnis: /u01/app/oracle/product/12.1.0.2/db
Oracle-Benutzer: oracle
Spfile: +ASMDAT/PRODDB/PARAMETERFILE/spfile.263.895424997
Kennwortdatei: +ASMDAT/PRODDB/PASSWORD/pwdproddb.258.895416527
Domain:
Startoptionen: open
Stoppoptionen: immediate
Datenbankrolle: PRIMARY
Verwaltungs-Policy: AUTOMATIC
Server-Pools: proddb
Datenträgergruppen: ASMDAT,ASMLOG1,ASMLOG2
Mount Point-Pfade:
Services:
Typ: RAC
Parallelität starten:
Parallelität stoppen:
OSDBA-Gruppe: oinstall
OSOPER-Gruppe:
Datenbankinstanzen: PRODDB_1,PRODDB_2
Konfigurierte Knoten:
Datenbank wird von Policy verwaltet

2. Remove node B from the cluster

First we have to stop the database instance on that node:

$ srvctl stop instance -db proddb -node server-b

It is a policy-managed cluster, so we moved the node to the Free-pool

$ srvctl relocate server -n server-b -g Free

Now that is the status of the server pools

$ srvctl status serverpool  -detail
Server-Pool-Name: Free
Anzahl der aktiven Server: 1
Namen der aktiven Server: server-b
NAME=server-b STATE=ONLINE
Server-Pool-Name: Generic
Anzahl der aktiven Server: 0
Namen der aktiven Server:
Server-Pool-Name: proddb
Anzahl der aktiven Server: 1
Namen der aktiven Server: server-a
NAME=server-a STATE=ONLINE

At this point we can remove node B from the cluster configuration following the documentation

Remove node A from the oraInventory on node B

$ /u01/app/grid/12.1.0.2/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES={server-b}" CRS=TRUE -silent -local
Oracle Universal Installer wird gestartet...

Swap-Speicher wird geprüft: muss größer sein als 500 MB.   Tatsächlich 60145 MB    Erfolgreich
The inventory pointer is located at /var/opt/oracle/oraInst.loc

[...]

Once this is done we can deconfigure Oracle Clusterware Stack on node B according to the documentation.

As root:

$ cd $ORACLE_HOME/crs/install/
$ ./rootcrs.sh -deconfig -force
Using configuration parameter file: /u01/app/grid/12.1.0.2/crs/install/crsconfig_params
Netzwerk 1 ist vorhanden
Subnet IPv4: 172.17.3.0/255.255.255.0/ipmpfs0, static
Subnet IPv6:

[...]

2017/09/06 09:46:21 CLSRSC-4006: Removing Oracle Trace File Analyzer (TFA) Collector.

2017/09/06 09:47:05 CLSRSC-4007: Successfully removed Oracle Trace File Analyzer (TFA) Collector.

2017/09/06 09:47:11 CLSRSC-336: Successfully deconfigured Oracle Clusterware stack on this node

Remove node B from the oraInventory on node A

cd $ORACLE_HOME/oui/bin/
./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES={server-a}" CRS=TRUE -silent -local
Oracle Universal Installer wird gestartet...

Swap-Speicher wird geprüft: muss größer sein als 500 MB.   Tatsächlich 36559 MB    Erfolgreich
The inventory pointer is located at /var/opt/oracle/oraInst.loc
'UpdateNodeList' war erfolgreich.

Finally, remove node B from Grid Infrastructure (as root)

crsctl delete node -n server-b
CRS-4661: Node server-b successfully deleted.

3. Turn Cluster Grid Infrastructure into Single Instance

Grid Infrastructure (GI) and Single Instance (SI) installations are nearly the same. So instead of
deinstalling GI and reinstalling (and patching) SI, we can simply change the installation type.

To do this, we first deconfigured GI

$ cd $ORACLE_HOME/crs/install
$ ./rootcrs.sh -deconfig -force
Using configuration parameter file: /u01/app/grid/12.1.0.2/crs/install/crsconfig_params
Netzwerk 1 ist vorhanden
Subnet IPv4: 172.17.3.0/255.255.255.0/ipmpfs0, static
Subnet IPv6:
Ziele pingen:
Netzwerk ist aktiviert

[…]

CRS-4133: Oracle High Availability Services has been stopped.
2017/09/19 12:20:22 CLSRSC-4006: Removing Oracle Trace File Analyzer (TFA) Collector.

2017/09/19 12:20:54 CLSRSC-4007: Successfully removed Oracle Trace File Analyzer (TFA) Collector.

2017/09/19 12:21:00 CLSRSC-336: Successfully deconfigured Oracle Clusterware stack on this node

GI uses some special root permissions for its installation, so we needed to revert the permission settings.

$ ./rootcrs.sh -unlock -crshome $ORACLE_HOME –nocrsstop

There is a bug when configuring SI when there are left-over files from a previous GI installation,
so we got rid of these.

$ cd /u01/app/oracle/
$ ls
admin          cfgtoollogs    crsdata        diagsnap       log
audit          checkpoints    diag           server-a       product
$ mv crsdata crsdata_old
$ cd -
/u01/app/grid/12.1.0.2/crs/install

A GI installation is linked with the RAC option enabled. With SI we do not need this and beside that, it will
fail to configure SI. So we disabled the RAC option.

$ cd $ORACLE_HOME/rdbms/lib
$ make -f ins_rdbms.mk rac_off ioracle

Now everything is prepared to configure SI (as root):

$ /u01/app/grid/12.1.0.2/perl/bin/perl -I/u01/app/grid/12.1.0.2/perl/lib -I/u01/app/grid/12.1.0.2/crs/install /u01/app/grid/12.1.0.2/crs/install/roothas.pl
Using configuration parameter file: /u01/app/grid/12.1.0.2/crs/install/crsconfig_params
2017/09/19 12:29:06 CLSRSC-363: User ignored prerequisites during installation

LOCAL ADD MODE
Creating OCR keys for user 'oracle', privgrp 'oinstall'..
Operation successful.
LOCAL ONLY MODE
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4664: Knoten server-b erfolgreich gepinnt.
2017/09/19 12:29:17 CLSRSC-330: Adding Clusterware entries to file '/etc/inittab'


server-b     2017/09/19 12:29:58     /u01/app/grid/12.1.0.2/cdata/server-b/backup_20170919_122958.olr     257602229
CRS-2791: Herunterfahren der von Oracle High Availability Services verwalteten Ressourcen auf "server-b" wird gestartet
CRS-2673: Versuch, "ora.evmd" auf "server-b" zu stoppen
CRS-2677: Stoppen von "ora.evmd" auf "server-b" erfolgreich
CRS-2793: Herunterfahren der von Oracle High Availability Services verwalteten Ressourcen auf "server-b" ist abgeschlossen
CRS-4133: Oracle High Availability Services wurde gestoppt
CRS-4123: Oracle High Availability Services wurde gestartet.
2017/09/19 12:30:36 CLSRSC-327: Successfully configured Oracle Restart for a standalone server

Configure SI

Having SI up and running, we need to configure some more things, namely ASM and a listener.

$ srvctl add asm
$ srvctl start asm
$ srvctl add listener
$ srvctl start listener
$ crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       server-b                 STABLE
ora.asm
               ONLINE  ONLINE       server-b                 Started,STABLE
ora.ons 
               OFFLINE OFFLINE      server-b                 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
      1        ONLINE  ONLINE       server-b                 STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.evmd
      1        ONLINE  ONLINE       server-b                 STABLE
--------------------------------------------------------------------------------

As per this setup, ASM comes without any configurtion files. That’s why we need to create the SPfile and password file on our own.
This is especially important for the following disk setup, we set the discovery string to be able to find our disks.

Setting up SPfile:

SQL> create spfile from memory;

File created.

$ srvctl modify asm -d '/dev/rdsk/asm*'
$ srvctl config asm
ASM-Standardverzeichnis: <CRS home>
Kennwortdatei:
ASM-Listener: LISTENER
Spfile: ?/dbs/spfile@.ora
ASM-Datenträgergruppe-Discovery-Zeichenfolge: /dev/rdsk/asm*
$ srvctl stop asm
$ srvctl start asm

Setting up ASM password file:

ASMCMD> pwcreate --asm +ASMDAT <SYS password>
ASMCMD> lspwusr
Username sysdba sysoper sysasm
     SYS   TRUE    TRUE  FALSE
ASMCMD>
ASMCMD> orapwusr --add asmsnmp
Enter password: ********
ASMCMD> lspwusr
Username sysdba sysoper sysasm
     SYS   TRUE    TRUE  FALSE
 ASMSNMP  FALSE   FALSE  FALSE
ASMCMD> orapwusr --grant sysasm asmsnmp
ASMCMD> orapwusr --grant sysasm sys
ASMCMD> lspwusr
Username sysdba sysoper sysasm
     SYS   TRUE    TRUE   TRUE
 ASMSNMP  FALSE   FALSE   TRUE

For the moved node B we created new LUNs for the database. We can’t use the old LUNs since they hold the existing database.
The new Data Guard setup requires separate storage for the database.

SQL> alter diskgroup asmdat mount;

Diskgroup altered.

SQL> alter diskgroup asmrec mount;

Diskgroup altered.

SQL> alter diskgroup asmlog1 mount;

Diskgroup altered.

SQL> alter diskgroup asmlog2 mount;

Diskgroup altered.


$ asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576    274408    36703                0           36703              0             N  ASMDAT/
MOUNTED  EXTERN  N         512   4096  1048576     10216     2893                0            2893              0             N  ASMLOG1/
MOUNTED  EXTERN  N         512   4096  1048576     10216     2893                0            2893              0             N  ASMLOG2/
MOUNTED  EXTERN  N         512   4096  1048576    102376    97502                0           97502              0             N  ASMREC/

5. Turn database installation into SI

Changing the database installation is pretty easy. We have to change to Oracle Inventory and we have to
disable the RAC option as we did in the GI installation.

First, we disable the RAC option for the database home:

$ cd $ORACLE_HOME/rdbms/lib/
$ make -f ins_rdbms.mk rac_off ioracle

Then we modify the inventory to get a single server installation:

$ cd $ORACLE_HOME/oui/bin
$ ./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES={server-b}" -silent -local
Oracle Universal Installer wird gestartet...

Swap-Speicher wird geprüft: muss größer sein als 500 MB.   Tatsächlich 61662 MB    Erfolgreich
The inventory pointer is located at /var/opt/oracle/oraInst.loc
'UpdateNodeList' war erfolgreich

6. Setting up Data Guard

I will not cover this in detail since this is a different topic. Just to outline it, we did a “duplicate for standby”, registered the duplicate DB with SI, set up Data Guard between the existing DB and the duplicate and enable the configuration.

7. Modify server A

Changing server A from GI to SI requires pretty much the same steps that we did to server B. We just did a Data Guard Switchover
to server B, disabled and dropped the Data Guard configuration and went through steps 1-6 on server A. That’s basically it.

8. Conclusion

Finally we ended up with a clean Single Instance installation on both servers, with single instance databases and a neat Data
Guard set up between them.

Oracle Grid Infrastructure 12.2 – HASMB

Oracle Grid Infrastructure used to have a feature to make a NFS share highly available. It simply uses an additional virtual IP address (VIP) and controls the NFS server to use this VIP. The client simply connects the NFS share using this VIP. So the NFS share is always available regardless on which node it currently running. This feature is restricted to ACFS, we can create NFS exports for an ACFS only which makes kind of sense. This whole thing is also known as HANFS.

New in Grid Infrastructure 12.2 is the ability to create highly available SMB shares, consequently called HASMB. Let’s see if and how this works. Some steps needs to be done as “root” while other steps can be done as “oracle”, see the command prompt to identify which user I actually used.

First, we need to install the samba server if not already done.

[root@vm151 ~]# yum install samba samba-winbind

Of couse this needs to be done on all servers in the cluster.

Now, we add a line to /etc/samba/smb.conf to allow guest access:

[global highlight="5"]
        workgroup = SAMBA
        security = user
        map to guest = Bad Password   

Finally, we start the samba server and enable it for autostart:

[root@vm151 ~]# systemctl start smb
[root@vm151 ~]# systemctl enable smb
Created symlink from /etc/systemd/system/multi-user.target.wants/smb.service to /usr/lib/systemd/system/smb.service.

For high availability we need to create a VIP:

[root@vm151 ~]# srvctl add havip -id mmiexport -address vm156.support.robotron.de

Next step is to create an ACFS filesystem that we will export later on:

[oracle@vm151 ~]$ asmcmd volcreate -G data -s 1G mmivolume
[oracle@vm151 ~]$ asmcmd volinfo -G data mmivolume
Diskgroup Name: DATA

         Volume Name: MMIVOLUME
         Volume Device: /dev/asm/mmivolume-1
         State: ENABLED
         Size (MB): 1024
         Resize Unit (MB): 64
         Redundancy: UNPROT
         Stripe Columns: 8
         Stripe Width (K): 1024
         Usage:
         Mountpath:

[oracle@vm151 ~]$ /sbin/mkfs -t acfs /dev/asm/mmivolume-1
mkfs.acfs: version                   = 12.2.0.1.0
mkfs.acfs: on-disk version           = 46.0
mkfs.acfs: volume                    = /dev/asm/mmivolume-1
mkfs.acfs: volume size               = 1073741824  (   1.00 GB )
mkfs.acfs: Format complete.

Ok, now we configure Grid Infrastructure resources for this filesystem:

[oracle@vm151 ~]$ mkdir -p /u01/app/oracle/acfs/mmivolume
[oracle@vm151 ~]$ ssh vm152 mkdir -p /u01/app/oracle/acfs/mmivolume



[root@vm151 ~]# srvctl add filesystem -volume mmivolume -diskgroup data -path /u01/app/oracle/acfs/mmivolume -user oracle
[root@vm151 ~]# srvctl start filesystem -volume mmivolume -diskgroup data
[root@vm151 ~]# srvctl status filesystem -volume mmivolume -diskgroup data
ACFS file system /u01/app/oracle/acfs/mmivolume is mounted on nodes vm151,vm152
[root@vm151 ~]# ll /u01/app/oracle/acfs/mmivolume
total 64
drwx------ 2 root root 65536 Jul 27 11:23 lost+found
[root@vm151 ~]# df -h /u01/app/oracle/acfs/mmivolume
Filesystem            Size  Used Avail Use% Mounted on
/dev/asm/mmivolume-1  1.0G  487M  538M  48% /u01/app/oracle/acfs/mmivolume

This is the moment when we can create the export:

[root@vm151 ~]# srvctl add exportfs -name mmiexportfs -path /u01/app/oracle/acfs/mmivolume -id mmiexport -type SMB
[oracle@vm151 ~]$ srvctl start havip -id mmiexport
[oracle@vm151 ~]$ srvctl status exportfs
export file system mmiexportfs is enabled
export file system mmiexportfs is exported on node vm151

Let’s see what it did to the samba config files:

[root@vm152 ~]# cat /etc/samba/smb.conf
...
# ACFS HASMB include
  include = /etc/samba/acfs/acfsinc.conf
...

[root@vm151 ~]# cat /etc/samba/acfs/acfsinc.conf
include=/etc/samba/acfs/mmiexportfs.conf

[root@vm151 ~]# cat /etc/samba/acfs/mmiexportfs.conf
[mmiexportfs]
path=/u01/app/oracle/acfs/mmivolume
read only = Yes
browsable = yes

It created an include for all the ACFS related things which in turn consists of additional includes for all the exports that we defined and that are currently active on the specific node.

So let’s try:

[oracle@vm152 ~]$ echo "Hallo" > /u01/app/oracle/acfs/mmivolume/mein.txt
C:\>dir \\vm156.support.robotron.de\mmiexportfs
Zugriff verweigert

Makes sense somehow. As we can see above, there is no option that allows guest access to the newly created share. So let’s add that:

[root@vm151 ~]# srvctl modify exportfs -name mmiexportfs -options "read only=no,browsable=yes,guest ok=yes,writeable=yes,public=yes"
[root@vm151 ~]# cat /etc/samba/acfs/mmiexportfs.conf
[mmiexportfs]
path=/u01/app/oracle/acfs/mmivolume
read only = Yes
browsable = yes

Ok, it does not apply the new options immediatly. It requires a restart:

[root@vm151 ~]# srvctl stop exportfs -name mmiexportfs -force
[root@vm151 ~]# srvctl start havip -id mmiexport -node vm151
[root@vm151 ~]# cat /etc/samba/acfs/mmiexportfs.conf
[mmiexportfs]
path=/u01/app/oracle/acfs/mmivolume
read only = no
browsable = yes
guest ok = yes
writeable = yes
public = yes

Let’s check again:

C:\>dir \\vm156.support.robotron.de\mmiexportfs
 Datenträger in Laufwerk \\vm156.support.robotron.de\mmiexportfs: ist mmiexportfs
 Volumeseriennummer: 6691-6876

 Verzeichnis von \\vm156.support.robotron.de\mmiexportfs

27.07.2017  12:51    <DIR>          .
27.07.2017  11:16    <DIR>          ..
27.07.2017  11:23    <DIR>          lost+found
27.07.2017  12:45                 6 mein.txt
               2 Datei(en),             15 Bytes
               3 Verzeichnis(se),    496.132.096 Bytes frei

C:\>type \\vm156.support.robotron.de\mmiexportfs\mein.txt
Hallo

How about creating files from the Windows end of the line?

C:\>echo Cheers > \\vm156.support.robotron.de\mmiexportfs\windows.txt
Zugriff verweigert

The reason for this are the filesystem permissions on the Linux side, we need to change these:

[root@vm152 ~]# ls -la /u01/app/oracle/acfs/mmivolume
total 100
drwxrwxr-x 4 oracle oinstall 32768 Jul 27 12:45 .
drwxr-xr-x 3 oracle oinstall    22 Jul 27 11:16 ..
drwx------ 2 root   root     65536 Jul 27 11:23 lost+found
-rw-r--r-- 1 oracle oinstall     6 Jul 27 12:45 mein.txt
[root@vm152 ~]# chmod 777 /u01/app/oracle/acfs/mmivolume
[root@vm152 ~]# ls -la /u01/app/oracle/acfs/mmivolume
total 100
drwxrwxrwx 4 oracle oinstall 32768 Jul 27 12:45 .
drwxr-xr-x 3 oracle oinstall    22 Jul 27 11:16 ..
drwx------ 2 root   root     65536 Jul 27 11:23 lost+found
-rw-r--r-- 1 oracle oinstall     6 Jul 27 12:45 mein.txt

And voila, now it works:

C:\>echo Cheers > \\vm156.support.robotron.de\mmiexportfs\windows.txt

C:\>type \\vm156.support.robotron.de\mmiexportfs\windows.txt
Cheers

And from the other end too:

[oracle@vm151 ~]# cat /u01/app/oracle/acfs/mmivolume/windows.txt
Cheers

The last step is to try a relocate of the export:

[oracle@vm152 ~]$ srvctl status exportfs -id mmiexport
export file system mmiexportfs is enabled
export file system mmiexportfs is exported on node vm151
[oracle@vm152 ~]$ srvctl relocate havip -id mmiexport -force
HAVIP was relocated successfully
[oracle@vm152 ~]$ srvctl status exportfs -id mmiexport
export file system mmiexportfs is enabled
export file system mmiexportfs is exported on node vm152

During the relocate operation, which is pretty fast by the way, I tried a “dir” opertion from my windows client. It simply “hangs” for a couple of seconds and returns results as soon as the share becomes available again. Pretty straight forward.

So all in all this is a nice expansion to include the Windows world without having to use NFS on Windows.
Of cause my example is not very bullet-proof in terms of security. This can (and should) be changed for production purposes.

AFD label operation throws ORA-15031

Some time ago I wrote about setting up ASM Filter Driver (AFD). There I labeled the disks and then set up the cluster. But now the cluster is live and due to space pressure I needed to add another disk to the system. So I thought this must be straight-forward, just configure multipathing, label the new device and add it to the diskgroup. But it was not as easy as I thought. These were my devices:

# ll /dev/dm*
brw-rw---- 1 root disk 253, 0 Oct 21 13:24 /dev/dm-0
brw-rw---- 1 root disk 253, 1 Oct 21 13:24 /dev/dm-1
brw-rw---- 1 root disk 253, 2 Oct 21 13:24 /dev/dm-2
brw-rw---- 1 root disk 253, 3 Oct 21 13:24 /dev/dm-3
brw-rw---- 1 root disk 253, 4 Oct 21 13:24 /dev/dm-4
brw-rw---- 1 root disk 253, 5 Jun  3 15:11 /dev/dm-5
brw-rw---- 1 root disk 253, 6 Oct 21 13:28 /dev/dm-6

The “dm-6” was my new device, so I tried to label it:

# asmcmd afd_label DATA03 /dev/dm-6
ORA-15227: could not perform label set/clear operation
ORA-15031: disk specification '/dev/dm-6' matches no disks (DBD ERROR: OCIStmtExecute)
ASMCMD-9513: ASM disk label set operation failed.

What? I tried to read and write to and from that device using dd which went fine. I checked the discovery string:

# asmcmd afd_dsget 
AFD discovery string: /dev/dm* 

Looked fine too. Next checked the “afd.conf” file:

# cat /etc/afd.conf
afd_diskstring='/dev/dm*'
afd_filtering=enable

No issues there.
Finally I checked the $ORACLE_HOME/bin directory for files that start with “afd*”. What I found was an executable called “afdtool” whick looked promising:

# afdtool 
Usage:
afdtool -add [-f] <devpath1, [devpath2,..]>  <labelname>
afdtool -delete [-f]  <devicepath | labelname>
afdtool -getdevlist [label] [-nohdr] [-nopath]
afdtool -filter <enable | disable>  <devicepath>
afdtool -rescan [discstr1, discstr2, ...]
afdtool -stop
afdtool -log  [-d <path>][-l <log_level>][-c <log_context>][-s <buf_size>]
              [-m <max_file_sz>][-n <max_file_num>] [-q] [-t] [-h]
afdtool -di <enable | disable | query>

So I gave it a try and it worked!

# afdtool -add /dev/dm-6 DATA03
Device /dev/dm-6 labeled with DATA03
 # afdtool -getdevlist
--------------------------------------------------------------------------------
Label                     Path
================================================================================
OCR                       /dev/dm-0
ARCH01                    /dev/dm-1
DATA01                    /dev/dm-2
GIMR                      /dev/dm-3
DATA02                    /dev/dm-4
DATA03                    /dev/dm-6

“asmcmd afd_lsdsk” returns the same of cause:

# asmcmd afd_lsdsk
--------------------------------------------------------------------------------
Label                     Filtering   Path
================================================================================
OCR                         ENABLED   /dev/dm-0
ARCH01                      ENABLED   /dev/dm-1
DATA01                      ENABLED   /dev/dm-2
GIMR                        ENABLED   /dev/dm-3
DATA02                      ENABLED   /dev/dm-4
DATA03                      ENABLED   /dev/dm-6

So the message is, that when the cluster stack is up and running, you have to use “afdtool”. On the other hand, when the stack is down, then “asmcmd afd_*” is the right choice. Thy to find that in the docs

What is interresting is, that the AFD devices are still owned by “root”:

 # ls -l /dev/oracleafd/disks/
total 24
-rw-r--r-- 1 oracle dba  10 Jun  3 15:11 ARCH01
-rw-r--r-- 1 oracle dba  10 Jun  3 15:11 DATA01
-rw-r--r-- 1 oracle dba  10 Jun  7 21:40 DATA02
-rw-r--r-- 1 root   root 10 Oct 25 11:02 DATA03
-rw-r--r-- 1 oracle dba  10 Jun  3 15:11 GIMR
-rw-r--r-- 1 oracle dba  10 Jun  3 15:11 OCR

But you can already use that disk:

SQL> alter diskgroup data add disk  'AFD:DATA03';

Diskgroup altered.

SQL> select name, header_status, path from v$asm_disk;

NAME                           HEADER_STATU PATH
------------------------------ ------------ --------------------
OCR                            MEMBER       AFD:OCR
ARCH01                         MEMBER       AFD:ARCH01
DATA01                         MEMBER       AFD:DATA01
GIMR                           MEMBER       AFD:GIMR
DATA02                         MEMBER       AFD:DATA02
DATA03                         MEMBER       AFD:DATA03

I don’t know how that works, but it does. The ownership of the device will be set after the next reboot, but take your time 🙂

Losing the ASM password file

A time ago I wrote about recovering from a lost Grid Inftrastructure Diskgroup. There I described the steps to re-create OCR, voting files, ASM SPfile and the Management DB. But something is missing, the ASM password file. This becomes very important in case you are using Flex ASM.

What it looks like initially

Let’s check what’s inside the passwordfile when everything runs fine.

[oracle@vm140 ~]$ asmcmd pwget --asm
+GI/orapwASM
[oracle@vm140 ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Tue Sep 13 10:13:05 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> select * from v$pwfile_users;

USERNAME                       SYSDB SYSOP SYSAS SYSBA SYSDG SYSKM     CON_ID
------------------------------ ----- ----- ----- ----- ----- ----- ----------
SYS                            TRUE  TRUE  TRUE  FALSE FALSE FALSE          0
CRSUSER__ASM_001               TRUE  FALSE TRUE  FALSE FALSE FALSE          0
ASMSNMP                        TRUE  FALSE FALSE FALSE FALSE FALSE          0

As you can see, the password file is inside ASM and there is not only the SYS user, but also an user named CRSUSER__ASM_001. This one is used to connect to remote ASM instances.

Lose the ASM password file

Losing the ASM password file is quite simple:

[oracle@vm140 ~]$ asmcmd rm +GI/orapwASM

Now, let’s check what happens. First, stop the clusterware on all nodes:

[root@vm140 ~]# crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'vm140'
...
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'vm140' has completed
CRS-4133: Oracle High Availability Services has been stopped.

Starting the CRS stack on first node

Once all nodes are down, I start the CRS stack on one node.

[root@vm140 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

This actually brings up the whole cluster stack, beside the fact that there is an error in the cluster alert.log:

[root@vm140 ~]# tail -f /u01/app/oracle/diag/crs/vm140/crs/trace/alert.log
2016-09-13 10:28:51.634 [CSSDAGENT(2839)]CRS-8500: Oracle Clusterware CSSDAGENT process is starting with operating system process ID 2839
2016-09-13 10:28:51.983 [OCSSD(2850)]CRS-8500: Oracle Clusterware OCSSD process is starting with operating system process ID 2850
2016-09-13 10:28:53.082 [OCSSD(2850)]CRS-1713: CSSD daemon is started in hub mode
2016-09-13 10:28:58.726 [OCSSD(2850)]CRS-1707: Lease acquisition for node vm140 number 1 completed
2016-09-13 10:28:59.829 [OCSSD(2850)]CRS-1605: CSSD voting file is online: AFD:GI; details in /u01/app/oracle/diag/crs/vm140/crs/trace/ocssd.trc.
2016-09-13 10:28:59.874 [OCSSD(2850)]CRS-1672: The number of voting files currently available 1 has fallen to the minimum number of voting files required 1.
2016-09-13 10:29:08.985 [OCSSD(2850)]CRS-1601: CSSD Reconfiguration complete. Active nodes are vm140 .
2016-09-13 10:29:11.181 [OCTSSD(2984)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 2984
2016-09-13 10:29:12.290 [OCTSSD(2984)]CRS-2407: The new Cluster Time Synchronization Service reference node is host vm140.
2016-09-13 10:29:12.291 [OCTSSD(2984)]CRS-2401: The Cluster Time Synchronization Service started on host vm140.
2016-09-13 10:29:20.185 [ORAAGENT(2589)]CRS-5011: Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/u01/app/oracle/diag/crs/vm140/crs/trace/ohasd_oraagent_oracle.trc"
2016-09-13 10:29:46.560 [ORAROOTAGENT(2424)]CRS-5019: All OCR locations are on ASM disk groups [GI], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/u01/app/oracle/diag/crs/vm140/crs/trace/ohasd_orarootagent_root.trc".
2016-09-13 10:30:06.471 [OSYSMOND(3312)]CRS-8500: Oracle Clusterware OSYSMOND process is starting with operating system process ID 3312
2016-09-13 10:30:07.960 [CRSD(3319)]CRS-8500: Oracle Clusterware CRSD process is starting with operating system process ID 3319
2016-09-13 10:30:10.250 [CRSD(3319)]CRS-1012: The OCR service started on node vm140.
2016-09-13 10:30:11.031 [CRSD(3319)]CRS-1201: CRSD started on node vm140.
2016-09-13 10:30:12.216 [ORAAGENT(3414)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 3414
2016-09-13 10:30:12.459 [ORAROOTAGENT(3418)]CRS-8500: Oracle Clusterware ORAROOTAGENT process is starting with operating system process ID 3418
2016-09-13 10:30:12.886 [OLOGGERD(3477)]CRS-8500: Oracle Clusterware OLOGGERD process is starting with operating system process ID 3477
2016-09-13 10:30:13.190 [ORAAGENT(3414)]CRS-5011: Check of resource "_mgmtdb" failed: details at "(:CLSN00007:)" in "/u01/app/oracle/diag/crs/vm140/crs/trace/crsd_oraagent_oracle.trc"
2016-09-13 10:30:13.218 [ORAAGENT(3414)]CRS-5011: Check of resource "ora.proxy_advm" failed: details at "(:CLSN00006:)" in "/u01/app/oracle/diag/crs/vm140/crs/trace/crsd_oraagent_oracle.trc"

In the mentioned trace files are some errors:

2016-09-13 10:29:46.046220 :    GPNP:1319061248: clsgpnp_dbmsGetItem_profile: [at clsgpnp_dbms.c:345] Result: (0) CLSGPNP_OK. (:GPNP00401:)got ASM-Profile.Mode='remote'
2016-09-13 10:29:46.051333 : default:1319061248: Inited LSF context: 0x7f36182b5220
2016-09-13 10:29:46.057196 : CLSCRED:1319061248: clsCredCommonInit: Inited singleton credctx.
2016-09-13 10:29:46.057226 : CLSCRED:1319061248: (:CLSCRED0101:)clsCredDomInitRootDom: Using user given storage context for repository access.
2016-09-13 10:29:46.192178 : USRTHRD:1319061248: {0:0:2} 6425 Error 4 querying length of attr ASM_DISCOVERY_ADDRESS

2016-09-13 10:29:46.209400 : USRTHRD:1319061248: {0:0:2} 6425 Error 4 querying length of attr ASM_STATIC_DISCOVERY_ADDRESS

2016-09-13 10:29:46.424315 : CLSCRED:1319061248: (:CLSCRED1079:)clsCredOcrKeyExists: Obj dom : SYSTEM.credentials.domains.root.ASM.Self.b50a6df0745b7fb4bfc0880a73d8f455.root not found
2016-09-13 10:29:46.424494 : USRTHRD:1319061248: {0:0:2} 6210 Error 4 opening dom root in 0x7f36181de990

2016-09-13 10:29:46.424494*:kgfn.c@6356: kgfnGetNodeType: flags=0x10
2016-09-13 10:29:46.424494*:kgfn.c@6369: kgfnGetNodeType: ntyp=1
2016-09-13 10:29:46.424494*:kgfn.c@4644: kgfnConnect2: kgfnGetBeqData failed
2016-09-13 10:29:46.483454 : default:1319061248: clsCredDomClose: Credctx deleted 0x7f36182dae20
2016-09-13 10:29:46.483454*:kgfn.c@4868: kgfnConnect2: failed to connect
2016-09-13 10:29:46.483454*:kgfn.c@4887: kgfnConnect: conn=(nil)
2016-09-13 10:29:46.483454*:kgfp.c@669: kgfpInitComplete2 hdl=0x7f36180be4f8 conn=0x7f36180be510 ok=0
2016-09-13 10:29:46.483454*:kgfo.c@947: kgfo_kge2slos error stack at kgfoAl06: ORA-15077: could not locate ASM instance serving a required diskgroup

2016-09-13 10:29:46.483454*:kgfo.c@1058: kgfoSaveError: ctx=0x7f36180e7300 hdl=(nil) gph=0x7f3618076c98 ose=0x7f364e9eae20 at kgfo.c:1006
2016-09-13 10:29:46.483454*:kgfo.c@1115: kgfoSaveError: ignoring existing error:
ORA-15077: could not locate ASM instance serving a required diskgroup

But in the end, the connection to the ASM instance works because it is using local BEQ connection:

2016-09-13 10:30:05.806944*:kgfo.c@698: kgfoAllocHandle cached conn=0x7f36180a6a50 magic=0xd31f gp=0x7f3618290378 env_only=0
2016-09-13 10:30:05.806944*:kgfp.c@651: kgfpInitComplete2 hdl=0x7f36180a6a38 magic=0xd31f rmt=0 flags=0x5
2016-09-13 10:30:05.806944*:kgfn.c@4432: kgfnConnect: inst=(null) srvc=+ASM clnt=3 cflags=0x10
2016-09-13 10:30:05.806944*:kgfn.c@6338: kgfnRemoteASM: remote=0
2016-09-13 10:30:05.806944*:kgfn.c@6379: kgfnGetClusType: flags=0x10
2016-09-13 10:30:05.841109 :    GPNP:1319061248: clsgpnp_dbmsGetItem_profile: [at clsgpnp_dbms.c:345] Result: (0) CLSGPNP_OK. (:GPNP00401:)got ASM-Profile.Mode='remote'
2016-09-13 10:30:05.841109*:kgfn.c@6392: kgfnGetClusType: ctyp=3
2016-09-13 10:30:05.841109*:kgfn.c@4504: kgfnConnect: cluster type 3
2016-09-13 10:30:05.841109*:kgfn.c@6356: kgfnGetNodeType: flags=0x10
2016-09-13 10:30:05.841109*:kgfn.c@6369: kgfnGetNodeType: ntyp=1
2016-09-13 10:30:05.841109*:kgfn.c@5266: kgfnGetBeqData: ios=0 inst=NULL flex=1 line 4539
2016-09-13 10:30:05.841109*:kgfn.c@2044: kgfnTgtInit: sid=(null) flags=0x6000
2016-09-13 10:30:05.841109*:kgfn.c@1205: kgfnFindLocalNode: sid=(null) skgp=(nil) flags=0x6000
2016-09-13 10:30:05.841109*:kgfn.c@1018: kgfn_find_node_sid sid=(null) mbrcnt=1 flex=1
2016-09-13 10:30:05.841109*:kgfn.c@1037: kgfn_find_node_side: nodenum_local=1, mbrs=1 max=256
2016-09-13 10:30:05.841109*:kgfn.c@1095: kgfn_find_node_sid: checking node=1 (+ASM1)
  processed=1 memnum=0 buflen=84
2016-09-13 10:30:05.841109*:kgfn.c@1115: kgfn_find_node_sid LOCAL sid=+ASM1 mbr=0
2016-09-13 10:30:05.841109*:kgfn.c@1148: kgfn_find_node_sid sid=(null) ret=1 lclnode=0x1
2016-09-13 10:30:05.841109*:kgfn.c@2207: kgfnTgtDestroy: sid=+ASM1 host=(null) port=0
cstr=(null) asminst=(null) flags=0x100
2016-09-13 10:30:05.841109*:kgfn.c@5327: kgfnGetBeqData: found a local instance
2016-09-13 10:30:05.841109*:kgfn.c@4680: kgfnConnect: srvr valid
2016-09-13 10:30:05.841109*:kgfn.c@4686: kgfnConnect: bequeath connection
2016-09-13 10:30:05.841109*:kgfn.c@5972: kgfnConnect2Int: sysasm=0 envflags=0x10 srvrflags=0x3 unam=NULL password is NULL pstr=_ocr
2016-09-13 10:30:05.841109*:kgfn.c@6134: kgfnConnect2Int: cstr=(DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/u01/app/12.1.0.2/grid/bin/oracle)(ARGV0=oracle+ASM1_ocr)(ENVS='ORACLE_HOME=/u01/app/12.1.0.2/grid,ORACLE_SID=+ASM1')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))')(PRIVS=(USER=oracle)(GROUP=oinstall)))(enable=setuser))
2016-09-13 10:30:05.841109*:kgfn.c@4887: kgfnConnect: conn=0x7f36180a6a50
2016-09-13 10:30:05.841109*:kgfp.c@669: kgfpInitComplete2 hdl=0x7f36180a6a38 conn=0x7f36180a6a50 ok=1
2016-09-13 10:30:06.085577 :kgfn.c@3680: kgfnStmtSingle res=0 [MOUNTED]

Starting the CRS stack on second node

Ok, now start the CRS stack on the second node. This is stuck starting “ora.storage” resource. The cluster alert.log looks similar to the one on the first node:

[root@vm141 ~]# tail -f /u01/app/oracle/diag/crs/vm141/crs/trace/alert.log
2016-09-13 10:35:55.511 [CSSDAGENT(2867)]CRS-8500: Oracle Clusterware CSSDAGENT process is starting with operating system process ID 2867
2016-09-13 10:35:55.843 [OCSSD(2882)]CRS-8500: Oracle Clusterware OCSSD process is starting with operating system process ID 2882
2016-09-13 10:35:56.952 [OCSSD(2882)]CRS-1713: CSSD daemon is started in hub mode
2016-09-13 10:36:02.779 [OCSSD(2882)]CRS-1707: Lease acquisition for node vm141 number 4 completed
2016-09-13 10:36:03.913 [OCSSD(2882)]CRS-1605: CSSD voting file is online: AFD:GI; details in /u01/app/oracle/diag/crs/vm141/crs/trace/ocssd.trc.
2016-09-13 10:36:03.956 [OCSSD(2882)]CRS-1672: The number of voting files currently available 1 has fallen to the minimum number of voting files required 1.
2016-09-13 10:36:05.624 [OCSSD(2882)]CRS-1601: CSSD Reconfiguration complete. Active nodes are vm140 vm141 .
2016-09-13 10:36:07.958 [OCTSSD(3014)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 3014
2016-09-13 10:36:09.066 [OCTSSD(3014)]CRS-2401: The Cluster Time Synchronization Service started on host vm141.
2016-09-13 10:36:09.066 [OCTSSD(3014)]CRS-2407: The new Cluster Time Synchronization Service reference node is host vm140.
2016-09-13 10:36:31.042 [ORAROOTAGENT(2569)]CRS-5019: All OCR locations are on ASM disk groups [GI], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/u01/app/oracle/diag/crs/vm141/crs/trace/ohasd_orarootagent_root.trc".

But the mentioned tracefile does look different this time:

2016-09-13 10:36:30.996133*:kgfo.c@1058: kgfoSaveError: ctx=0x7f83ac121700 hdl=(nil) gph=0x7f83ac0ae9e8 ose=0x7f83c4c63df0 at kgfo.c:1006
2016-09-13 10:36:30.996133*:kgfo.c@1115: kgfoSaveError: ignoring existing error:
ORA-01017: invalid username/password; logon denied
ORA-17503: ksfdopn:2 Failed to open file +GI/orapwasm
ORA-15173: entry 'orapwasm' does not exist in directory '/'
ORA-06512: at line 4
ORA-15077: could not locate ASM instance serving a required diskgroup

2016-09-13 10:36:30.996133*:kgfo.c@817: kgfoFreeHandle ctx=0x7f83ac121700 hdl=0x7f83ac0f8918 conn=0x7f83ac0f8950 disconnect=0
2016-09-13 10:36:30.996133*:kgfo.c@846:   disconnect hdl 0x7f83ac0f8918 (recycling)
2016-09-13 10:36:30.996133*:kgfo.c@2757: Handle Alloc failed - kgfoCheckMount Reconnecting
2016-09-13 10:36:30.996133*:kgfo.c@2846: kgfoCheckMount dg=GI ok=0
2016-09-13 10:36:30.996467 : USRTHRD:3301365504: {0:9:3} -- trace dump on error exit --

2016-09-13 10:36:30.996497 : USRTHRD:3301365504: {0:9:3} Error [kgfoAl06] in [kgfokge] at kgfo.c:2850

2016-09-13 10:36:30.996520 : USRTHRD:3301365504: {0:9:3} ORA-01017: invalid username/password; logon denied
ORA-17503: ksfdopn:2 Failed to open file +GI/orapwasm
ORA-15173: entry 'orapwasm' does not exist in directory

That’s obvious. I deleted the password file and hence it cannot be located and ASM startuo fails.

Creating a new ASM password file

So let’s go back to the first node where everything is running fine and create a new password file:

[oracle@vm140 ~]$ orapwd file=+GI/orapwASM asm=y

Enter password for SYS:
[oracle@vm140 ~]$ asmcmd ls -l +GI/orapwASM
Type      Redund  Striped  Time             Sys  Name
PASSWORD  UNPROT  COARSE   SEP 13 10:00:00  N    orapwASM => +GI/ASM/P

[oracle@vm140 ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Tue Sep 13 10:43:39 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> select * from v$pwfile_users;

USERNAME                       SYSDB SYSOP SYSAS SYSBA SYSDG SYSKM     CON_ID
------------------------------ ----- ----- ----- ----- ----- ----- ----------
SYS                            TRUE  TRUE  FALSE FALSE FALSE FALSE          0

But how do I get this CRS user back in? Like this, and be careful, there is a double-underscore between CRSUSER and ASM_001:

[oracle@vm140 ~]$ asmcmd lspwusr
Username sysdba sysoper sysasm
     SYS   TRUE    TRUE  FALSE

[oracle@vm140 ~]$ asmcmd orapwusr --add CRSUSER__ASM_001
Enter password: ********

[oracle@vm140 ~]$ asmcmd orapwusr --grant sysasm CRSUSER__ASM_001
[oracle@vm140 ~]$ asmcmd orapwusr --grant sysdba CRSUSER__ASM_001
[oracle@vm140 ~]$ asmcmd lspwusr
        Username sysdba sysoper sysasm
             SYS   TRUE    TRUE  FALSE
CRSUSER__ASM_001   TRUE   FALSE   TRUE

Ok, here we go. The CRS user is back again. You may use SQL*Plus or asmcmd to grant privileges or query password file contents. I used both methods as you can see.

Starting the CRS stack on second node, again

Now that I have my ASM password file back again, I restart the CRS stack on the second node:

[root@vm141 ~]# crsctl stop crs -f

[root@vm141 ~]# crsctl start crs 

And check the alert.log:

[root@vm141 ~]# tail -f /u01/app/oracle/diag/crs/vm141/crs/trace/alert.log
2016-09-13 10:53:41.694 [OCTSSD(10211)]CRS-2407: The new Cluster Time Synchronization Service reference node is host vm140.
2016-09-13 10:53:41.700 [OCTSSD(10211)]CRS-2401: The Cluster Time Synchronization Service started on host vm141.
2016-09-13 10:53:57.194 [ORAROOTAGENT(9887)]CRS-5019: All OCR locations are on ASM disk groups [GI], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/u01/app/oracle/diag/crs/vm141/crs/trace/ohasd_orarootagent_root.trc".

The error is still there, but the tracefle now tells different things:

2016-09-13 10:53:54.781688 :   CLSNS:1479268096: clsns_SetTraceLevel:trace level set to 1.
2016-09-13 10:53:54.818703 :    GPNP:1479268096: clsgpnp_dbmsGetItem_profile: [at clsgpnp_dbms.c:345] Result: (0) CLSGPNP_OK. (:GPNP00401:)got ASM-Profile.Mode='remote'
2016-09-13 10:53:54.825001 : default:1479268096: Inited LSF context: 0x7f9f30285b60
2016-09-13 10:53:54.831784 : CLSCRED:1479268096: clsCredCommonInit: Inited singleton credctx.
2016-09-13 10:53:54.831857 : CLSCRED:1479268096: (:CLSCRED0101:)clsCredDomInitRootDom: Using user given storage context for repository access.
2016-09-13 10:53:54.932529 : USRTHRD:1479268096: {0:9:3} 6425 Error 4 querying length of attr ASM_DISCOVERY_ADDRESS

2016-09-13 10:53:54.942595 : USRTHRD:1479268096: {0:9:3} 6425 Error 4 querying length of attr ASM_STATIC_DISCOVERY_ADDRESS

2016-09-13 10:53:55.046462 : CLSCRED:1479268096: (:CLSCRED1079:)clsCredOcrKeyExists: Obj dom : SYSTEM.credentials.domains.root.ASM.Self.b50a6df0745b7fb4bfc0880a73d8f455.root not found
2016-09-13 10:53:55.046657 : USRTHRD:1479268096: {0:9:3} 6210 Error 4 opening dom root in 0x7f9f302d65e0

2016-09-13 10:53:55.046657*:kgfn.c@6356: kgfnGetNodeType: flags=0x10
2016-09-13 10:53:55.046657*:kgfn.c@6369: kgfnGetNodeType: ntyp=1
2016-09-13 10:53:55.046657*:kgfn.c@4644: kgfnConnect2: kgfnGetBeqData failed
2016-09-13 10:53:55.046657*:kgfn.c@4680: kgfnConnect: srvr valid
2016-09-13 10:53:55.046657*:kgfn.c@5972: kgfnConnect2Int: sysasm=0 envflags=0x10 srvrflags=0x1 unam=crsuser__asm_001 password is NOT NULL pstr=_ocr
2016-09-13 10:53:55.046657*:kgfn.c@6121: kgfnConnect2Int: hosts=1
2016-09-13 10:53:55.046657*:kgfn.c@6134: kgfnConnect2Int: cstr=(DESCRIPTION=(TRANSPORT_CONNECT_TIMEOUT=60)(EXPIRE_TIME=1)(LOAD_BALANCE=ON)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.1)(PORT=1526)))(CONNECT_DATA=(SERVICE_NAME=+ASM)))
2016-09-13 10:53:55.046657*:kgfn.c@6200: kgfnConnect2Int: OCISessionBegin failed
2016-09-13 10:53:55.046657*:kgfn.c@1602: kgfnRecordErrPriv: status=-1  at kgfn.c:6284
2016-09-13 10:53:55.046657*:kgfn.c@1648: kgfnRecordErrPriv: 1017 error=ORA-01017: invalid username/password; logon denied

2016-09-13 10:53:55.046657*:kgfn.c@1684: kgfnRecordErrPriv: rec=1
2016-09-13 10:53:57.155070 : default:1479268096: clsCredDomClose: Credctx deleted 0x7f9f302c33b0
2016-09-13 10:53:57.155070*:kgfn.c@4868: kgfnConnect2: failed to connect
2016-09-13 10:53:57.155070*:kgfn.c@4887: kgfnConnect: conn=(nil)
2016-09-13 10:53:57.155070*:kgfp.c@669: kgfpInitComplete2 hdl=0x7f9f300beec8 conn=0x7f9f300beee0 ok=0
2016-09-13 10:53:57.155070*:kgfo.c@947: kgfo_kge2slos error stack at kgfoAl06: ORA-01017: invalid username/password; logon denied
ORA-15077: could not locate ASM instance serving a required diskgroup

Obviously the password I gave the CRSUSER is not correct. When I checked My Oracle Support for those messages, I found How to Restore ASM Password File if Lost ( ORA-01017 ORA-15077 ) (Doc ID 1644005.1). But this note only describes the process of backing up and restoring the ASM password file. That is something I should have done in the frst place. And something that you and I should do at the very beginning of a cluster installation before we are going production.
So I investigated further and found an ODA related note ODA: CRS Could Not Start on Second ODA Node Due to Invalid ASM Credentials for The “crsuser__asm_001” Clusterware User (Doc ID 2139591.1). This one describes how to recover the lost password. It is still there somehow, as a hash in the clusterware wallets.

Recovering the CRSUSER password

Go back to the running node and do all the steps from there. First, query the path where the ASM password is stored:

[oracle@vm140 ~]$ crsctl query credmaint -path ASM/Self -credtype userpass
Path                                           Credtype   ID   Attrs

/ASM/Self/b50a6df0745b7fb4bfc0880a73d8f455     userpass   0    create_time=2016
                                                               -06-10 15:04:13,
                                                               modify_time=2016
                                                               -06-10 15:04:13,
                                                               expiration_time=
                                                               NEVER,bootstrap=
                                                               FALSE

I can use this path to check for the right user and query it’s password:

[oracle@vm140 ~]$ crsctl get credmaint -path /ASM/Self/b50a6df0745b7fb4bfc0880a73d8f455 -credtype userpass -id 0 -attr user -local 
crsuser__asm_001
[oracle@vm140 ~]$ crsctl get credmaint -path /ASM/Self/b50a6df0745b7fb4bfc0880a73d8f455 -credtype userpass -id 0 -attr passwd -local
B50T01O3wZydcz8nIeydae3qRZhUU

Now that I know the password hash, I can use that to set the propper password for my CRSUSER__ASM_001:

[oracle@vm140 ~]$ asmcmd orapwusr --modify CRSUSER__ASM_001
Enter password: *****************************

Start CRS stack on second node, again and again

Finally, stop and start the CRS stack again:

[root@vm141 ~]# crsctl stop crs -f

[root@vm141 ~]# crsctl start crs

Happily it is successful this time.

Remarks

Be sure to have a current backup of your ASM passwordfile (beside all the other tCRS related files) to ensure recoverability.
Note, this effect may also happen when starting the first node. In that case, start ASM manually and then perform the steps to recover the password file.

ASM Filter Driver CPU load – don’t care

Some time ago I wrote about ASM Filter Driver installation. If you are using AFD, then you might notice a permanent CPU load of 1.0 as we did. There was nothing else running, we stopped the Oracle Clusterware, stopped the Cloud Control Agent, still 1.0 load. Even after disabling the whole cluster stack and a reboot. But actually there were no processes visible that consumed CPU.

[root@vm101 ~]# crsctl disable crs
CRS-4621: Oracle High Availability Services autostart is disabled.

That’s what “top” said after reboot:

top - 20:44:51 up 4 min,  1 user,  load average: 0.00, 0.03, 0.02
Tasks: 108 total,   2 running, 106 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.2 sy,  0.0 ni, 99.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  5889124 total,  5664764 free,   118360 used,   106000 buff/cache
KiB Swap:  6143996 total,  6143996 free,        0 used.  5718728 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2291 root      20   0  129880   1676   1196 R   0.3  0.0   0:00.06 top
    1 root      20   0   54256   3968   2320 S   0.0  0.1   0:02.13 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kthreadd
    3 root      20   0       0      0      0 S   0.0  0.0   0:00.00 ksoftirqd/0

But when we loaded the AFD kernel driver, the load immediately went up to 1.0.

[root@vm101 ~]# lsmod |grep afd
[root@vm101 ~]# modprobe -r oracleafd
[root@vm101 ~]# lsmod |grep afd
oracleafd             205593  0

That’s what “top” told us now:

top - 20:52:20 up 12 min,  1 user,  load average: 1.06, 0.76, 0.37
Tasks: 104 total,   2 running, 102 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.1 sy,  0.0 ni, 99.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  5889124 total,  5647164 free,   129004 used,   112956 buff/cache
KiB Swap:  6143996 total,  6143996 free,        0 used.  5707868 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 root      20   0   54256   3972   2320 S   0.0  0.1   0:02.14 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kthreadd

So a SR with Oracle was opened to clarify that. The explanation is quite simple. Linux counts processes in an uninterruptible wait in the load average, even though these processes don’t use any CPU. Those processes can be identified by the “D” state in ps -l output.

Let’s check the “D” state as Oracle Support mentioned:

[root@vm101 ~]# ps -efl | grep -E '^. D'
1 D root      2322     2  0  80   0 -     0 AfdgWa 20:45 ?        00:00:00 [afd_log]

You can see the “D” state in the second column. So that’s it. Don’t care about the load. It does not tell the truth.

And many thanks to my collegue for investigating this issue.

Using ASM Filter Driver right from the beginning

Preface

If you are running Oracle RAC, then configuring the shared storage is one of the main preinstallation tasks that needs to be done. You need to configure multipathing and make sure that the device name that will be used for ASM is always the same. And you must set permissions and ownership for these devices. On Linux you can use ASMlib for that. It stamps the devices so that it can identify them, provides an unique and consistent name for ASM and sets propper permissions for the devices. But it still possible for other processes to write to these devices, using “dd” for instance.

Now there is Oracle Grid Infrastructure 12c which introduces a replacement for ASMlib called ASM Filter Driver (AFD). Basically it does the same things as ASMlib but in addition to that it is able to block write operations from other processes than Oracle’s own ones.

So that is a good thing and I wanted to use it for a new cluster that I should set up. And that is where the trouble starts. Beside the fact that there were some bugs in the initial versions of AFD from which most got fixed by the April 2016 PSU, AFD is installed as part of Grid Infrastructure. You can read that in the Automatic Storage Management Docs. It states the following:

After installation of Oracle Grid Infrastructure, you can optionally configure Oracle ASMFD for your system.

What? After installation? But I need it right from the beginning to use it for my initial disk group. How about that? There is a MOS note How to Install ASM Filter Driver in a Linux Environment Without Having Previously Installed ASMLIB (Doc ID 2060259.1)  but this Whitepaper also asumes that Grid Infrastructure is already installed.

But as you can read from this blog posts title, there is a way to use AFD from scratch, but it is not really straight forward.

1. Install Grid Infrastructure Software

First step is to install Grid Infrastructure as a software only installation. That implies that you have to do it on all nodes that should form the future cluster. I did that on the first node, saved the response file and did a silent install on the other nodes.

[oracle@vm140 ~] ./runInstaller -silent -responseFile /home/oracle/stage/grid/grid.rsp -ignorePrereq

At the end of the installation you need to run the “orainstRoot.sh” script which itself provides two other root scripts which configure either a cluster or a stand alone server:

[root@vm140 ~]# /u01/app/oraInventory/orainstRoot.sh
Changing permissions of /u01/app/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.

Changing groupname of /u01/app/oraInventory to oinstall.
The execution of the script is complete.
[root@vm140 ~]# /u01/app/12.1.0.2/grid/root.sh
Performing root user operation.

The following environment variables are set as:
	ORACLE_OWNER= oracle
	ORACLE_HOME=  /u01/app/12.1.0.2/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...


Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.

To configure Grid Infrastructure for a Stand-Alone Server run the following command as the root user:
/u01/app/12.1.0.2/grid/perl/bin/perl -I/u01/app/12.1.0.2/grid/perl/lib -I/u01/app/12.1.0.2/grid/crs/install /u01/app/12.1.0.2/grid/crs/install/roothas.pl


To configure Grid Infrastructure for a Cluster execute the following command as oracle user:
/u01/app/12.1.0.2/grid/crs/config/config.sh
This command launches the Grid Infrastructure Configuration Wizard. The wizard also supports silent operation, and the parameters can be passed through the response file that is available in the installation media.

For the moment, we do not run any of these scripts.

2. Patching Grid Infrastructure software

Next step is to patch GI software to get the latest version for AFD. Simply update OPatch on all nodes and use “opatchauto” to patch GI home. You need to specify the ORACLE_HOME path using “-oh” parameter to patch an unconfigured Grid Infrastructure home.

[root@vm140 ~]# export ORACLE_HOME=/u01/app/12.1.0.2/grid
[root@vm140 ~]# export PATH=$ORACLE_HOME/OPatch:$PATH
[root@vm140 ~]# opatch version
OPatch Version: 12.1.0.1.12

OPatch succeeded.

[root@vm140 ~]# opatchauto apply /home/oracle/stage/22646084 -oh $ORACLE_HOME

[...]

--------------------------------Summary--------------------------------

Patching is completed successfully. Please find the summary as follows:

Host:vm140
CRS Home:/u01/app/12.1.0.2/grid
Summary:

==Following patches were SUCCESSFULLY applied:

Patch: /home/oracle/stage/22646084/21436941
Log: /u01/app/12.1.0.2/grid/cfgtoollogs/opatchauto/core/opatch/opatch2016-06-10_13-54-11PM_1.log

Patch: /home/oracle/stage/22646084/22291127
Log: /u01/app/12.1.0.2/grid/cfgtoollogs/opatchauto/core/opatch/opatch2016-06-10_13-54-11PM_1.log

Patch: /home/oracle/stage/22646084/22502518
Log: /u01/app/12.1.0.2/grid/cfgtoollogs/opatchauto/core/opatch/opatch2016-06-10_13-54-11PM_1.log

Patch: /home/oracle/stage/22646084/22502555
Log: /u01/app/12.1.0.2/grid/cfgtoollogs/opatchauto/core/opatch/opatch2016-06-10_13-54-11PM_1.log


OPatchAuto successful.

You see that with the latest OPatch version there is no need to create an ocm.rsp resopnse file anymore.

3. Configure Restart

Configure Restart? Why? Because it sets up everything we need to use AFD but does not need any shared storage or other cluster related things like virtual IPs, SCANs and so on.
Therefore you use the script that was provided earlier by the “orainstRoot.sh” script. Do that on all nodes of the future cluster.

[root@vm140 ~]# /u01/app/12.1.0.2/grid/perl/bin/perl -I/u01/app/12.1.0.2/grid/perl/lib -I/u01/app/12.1.0.2/grid/crs/install /u01/app/12.1.0.2/grid/crs/install/roothas.pl

4. Deconfigure Restart

After Restart was configured, you can deconfigure it right away. Everything that is needed for AFD is being kept. The documentation for that is here.

[root@vm140 ~]# cd /u01/app/12.1.0.2/grid/crs/install/
[root@vm140 install]# ./roothas.sh -deconfig -force

5. Confiure ASM Filter Driver

Now you can finally start configuring AFD. The whitepaper from the MOS note mentioned at the beginning provides a good overview of what has to be done. Simply connect as “root”, set the environment and run the following:

[root@vm140 install]# $ORACLE_HOME/bin/asmcmd afd_configure
Connected to an idle instance.
AFD-627: AFD distribution files found.
AFD-636: Installing requested AFD software.
AFD-637: Loading installed AFD drivers.
AFD-9321: Creating udev for AFD.
AFD-9323: Creating module dependencies - this may take some time.
AFD-9154: Loading 'oracleafd.ko' driver.
AFD-649: Verifying AFD devices.
AFD-9156: Detecting control device '/dev/oracleafd/admin'.
AFD-638: AFD installation correctness verified.
Modifying resource dependencies - this may take some time.
ASMCMD-9524: AFD configuration failed 'ERROR: OHASD start failed'
[root@vm140 install]# $ORACLE_HOME/bin/asmcmd afd_state
Connected to an idle instance.
ASMCMD-9526: The AFD state is 'LOADED' and filtering is 'DISABLED' on host 'vm140'

Don’t care about the error and the message that is telling it failed. That is simply because there is no cluster at all at the moment.
As a final configuration step you need to set the discovery string for AFD so that it can find the disks you want to use. This is defined inside “/etc/afd.conf”:

[root@vm140 install]# cat /etc/afd.conf
afd_diskstring='/dev/xvd*'

The above steps need to be done on all servers of the future cluster.
Now that AFD is configured, you can start labeling your disks. Do this on only one node:

[root@vm140 install]# $ORACLE_HOME/bin/asmcmd afd_label GI /dev/xvdb1
Connected to an idle instance.
[root@vm140 install]# $ORACLE_HOME/bin/asmcmd afd_label DATA /dev/xvdc1
Connected to an idle instance.
[root@vm140 install]# $ORACLE_HOME/bin/asmcmd afd_label FRA /dev/xvdd1
Connected to an idle instance.

[root@vm140 install]# $ORACLE_HOME/bin/asmcmd afd_lsdsk
Connected to an idle instance.
--------------------------------------------------------------------------------
Label                     Filtering   Path
================================================================================
GI                         DISABLED   /dev/xvdb1
DATA                       DISABLED   /dev/xvdc1
FRA                        DISABLED   /dev/xvdd1

On all the other nodes just do a rescan of the disks:

[root@vm141 install]# $ORACLE_HOME/bin/asmcmd afd_scan
Connected to an idle instance.
[root@vm141 install]# $ORACLE_HOME/bin/asmcmd afd_lsdsk
Connected to an idle instance.
--------------------------------------------------------------------------------
Label                     Filtering   Path
================================================================================
GI                         DISABLED   /dev/xvdb1
DATA                       DISABLED   /dev/xvdc1
FRA                        DISABLED   /dev/xvdd1

That’s it.

6. Configure cluster with AFD

Finally, you can start configuring your new cluster and use AFD disks right from the beginning. You can now use the Cluster Configuration Assistant that was mentioned by “orainstRoot.sh” to set up your cluster.

[oracle@vm140 ~]$ /u01/app/12.1.0.2/grid/crs/config/config.sh

Follow the steps and you will see the well-known screens for setting up a cluster. At the point when you define the initial Grid Inftrastructure diskgroup you can now specify the “Discovery String”:

And, voila, you see the previously labeled disks:

And after you run the root scripts on all nodes, you’ll get a running cluster:

[root@vm140 bin]# ./crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
			   ONLINE  ONLINE       vm140                    STABLE
			   ONLINE  ONLINE       vm141                    STABLE
			   ONLINE  ONLINE       vm142                    STABLE
			   ONLINE  ONLINE       vm143                    STABLE
ora.GI.dg
			   ONLINE  ONLINE       vm140                    STABLE
			   ONLINE  ONLINE       vm141                    STABLE
			   ONLINE  ONLINE       vm142                    STABLE
			   OFFLINE OFFLINE      vm143                    STABLE
ora.LISTENER.lsnr
			   ONLINE  ONLINE       vm140                    STABLE
			   ONLINE  ONLINE       vm141                    STABLE
			   ONLINE  ONLINE       vm142                    STABLE
			   ONLINE  ONLINE       vm143                    STABLE
ora.net1.network
			   ONLINE  ONLINE       vm140                    STABLE
			   ONLINE  ONLINE       vm141                    STABLE
			   ONLINE  ONLINE       vm142                    STABLE
			   ONLINE  ONLINE       vm143                    STABLE
ora.ons
			   ONLINE  ONLINE       vm140                    STABLE
			   ONLINE  ONLINE       vm141                    STABLE
			   ONLINE  ONLINE       vm142                    STABLE
			   ONLINE  ONLINE       vm143                    STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
	  1        ONLINE  ONLINE       vm140                    STABLE
ora.MGMTLSNR
	  1        ONLINE  ONLINE       vm140                    169.254.231.166 192.
															 168.1.1,STABLE
ora.asm
	  1        ONLINE  ONLINE       vm140                    Started,STABLE
	  2        ONLINE  ONLINE       vm142                    Started,STABLE
	  3        ONLINE  ONLINE       vm141                    Started,STABLE
ora.cvu
	  1        ONLINE  ONLINE       vm140                    STABLE
ora.mgmtdb
	  1        ONLINE  ONLINE       vm140                    Open,STABLE
ora.oc4j
	  1        ONLINE  ONLINE       vm140                    STABLE
ora.scan1.vip
	  1        ONLINE  ONLINE       vm140                    STABLE
ora.vm140.vip
	  1        ONLINE  ONLINE       vm140                    STABLE
ora.vm141.vip
	  1        ONLINE  ONLINE       vm141                    STABLE
ora.vm142.vip
	  1        ONLINE  ONLINE       vm142                    STABLE
ora.vm143.vip
	  1        ONLINE  ONLINE       vm143                    STABLE
--------------------------------------------------------------------------------

And that’s it. Nothing more to do. Beside creating more disk groups and setting up databases. But that is simple compared to what we’ve done till now.