ora.storage hangs after node reboot

A couple of days ago I experienced a node reboot in a Windows RAC cluster at a customers site. Beside the fact, that we could not identify any root cause for the reboot, there were no messages in the clusters alert.log, the ocssd.log or crsd.log, the clusterware did not start properly after the node came up again.
The clusterware is 12.1.0.2 with the April 2017 BundlePatch installed.
When checking the progress of the startup via “crsctl stat res -t -init” we could see that it was stuck at starting “ora.storage” resource. After 10 minutes of waiting it simply timed out.

So I checked the cluster alert.log first:

2017-12-05 14:17:52.417 [ORAROOTAGENT(1960)]CRS-5019: All OCR locations are on ASM disk groups [OCR], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "D:\app\oracle\diag\crs\server001\crs\trace\ohasd_orarootagent_system.trc".
2017-12-05 14:27:40.137000 +01:00
2017-12-05 14:27:40.137 [ORAROOTAGENT(3960)]CRS-5818: Aborted command 'start' for resource 'ora.storage'. Details at (:CRSAGF00113:) {0:9:4} in D:\app\oracle\diag\crs\server001\crs\trace\ohasd_orarootagent_system.trc.
2017-12-05 14:27:40.823 [ORAROOTAGENT(1960)]CRS-5017: The resource action "ora.storage start" encountered the following error:
2017-12-05 14:27:40.823+Storage agent start action aborted. For details refer to "(:CLSN00107:)" in "D:\app\oracle\diag\crs\server001\crs\trace\ohasd_orarootagent_system.trc".
2017-12-05 14:27:44.158000 +01:00
2017-12-05 14:27:44.158 [OHASD(1448)]CRS-2757: Command 'Start' timed out waiting for response from the resource 'ora.storage'. Details at (:CRSPE00163:) {0:9:4} in D:\app\oracle\diag\crs\server001\crs\trace\ohasd.trc.
2017-12-05 14:28:53.847000 +01:00

Next, I went to the mentioned “ohasd_orarootagent_system.trc” and found many many lines like these:

2017-12-05 14:26:48.471105*:kgfn.c@6356: kgfnGetNodeType: flags=0x10
2017-12-05 14:26:48.471105*:kgfn.c@6369: kgfnGetNodeType: ntyp=1
2017-12-05 14:26:48.471105*:kgfn.c@4644: kgfnConnect2: kgfnGetBeqData failed
2017-12-05 14:26:48.471105*:kgfn.c@4680: kgfnConnect: srvr valid
2017-12-05 14:26:48.471105*:kgfn.c@5972: kgfnConnect2Int: sysasm=0 envflags=0x10 srvrflags=0x1 unam=crsuser__asm_001 password is NOT NULL pstr=_ocr
2017-12-05 14:26:48.471105*:kgfn.c@6121: kgfnConnect2Int: hosts=1
2017-12-05 14:26:48.471105*:kgfn.c@6134: kgfnConnect2Int: cstr=(DESCRIPTION=(TRANSPORT_CONNECT_TIMEOUT=60)(EXPIRE_TIME=1)(LOAD_BALANCE=ON)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.50.35)(PORT=1522)))(CONNECT_DATA=(SERVICE_NAME=+ASM)))
2017-12-05 14:26:48.471105*:kgfn.c@6144: kgfnConnect2Int: OCIServerAttach failed
2017-12-05 14:26:48.471105*:kgfn.c@1602: kgfnRecordErrPriv: status=-1  at kgfn.c:6300
2017-12-05 14:26:48.471105*:kgfn.c@1648: kgfnRecordErrPriv: 12541 error=ORA-12541: TNS:no listener

Look at the highlighted lines, the process ist attempting to contact the ASM instance of the surviving node. 192.168.50.35 is the IP of the interconnect network of the surving node. The interconnect IP of the rebooting node is 192.168.50.33. So the connection string is correct at the first glance. But why does it return “TNS-12541: No listener”? I checked the ASM listener on the other node:

C:\>lsnrctl status ASMNET1LSNR_ASM

LSNRCTL for 64-bit Windows: Version 12.1.0.2.0 - Production on 05-DEC-2017 14:32:47

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=ASMNET1LSNR_ASM)))
STATUS of the LISTENER
------------------------
Alias                     ASMNET1LSNR_ASM
Version                   TNSLSNR for 64-bit Windows: Version 12.1.0.2.0 - Production
Start Date                28-OCT-2017 12:25:58
Uptime                    38 days 3 hr. 6 min. 53 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   D:\app\grid\12.1.0.2\network\admin\listener.ora
Listener Log File         D:\app\oracle\diag\tnslsnr\server002\asmnet1lsnr_asm\alert\log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=\\.\pipe\ASMNET1LSNR_ASMipc)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.50.35)(PORT=1522)))
Services Summary...
Service "+ASM" has 2 instance(s).
  Instance "+asm1", status READY, has 1 handler(s) for this service...
  Instance "+asm2", status READY, has 2 handler(s) for this service...
The command completed successfully

As you see, the listeners address and port are correct. But why are there 2 instances? There should be only 1 for “+asm2” which is the instance of the surving node. Let’s double-check that:

C:\>lsnrctl services ASMNET1LSNR_ASM

LSNRCTL for 64-bit Windows: Version 12.1.0.2.0 - Production on 05-DEC-2017 14:33:15

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=ASMNET1LSNR_ASM)))
Services Summary...
Service "+ASM" has 2 instance(s).
  Instance "+asm1", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:473 refused:0 state:ready
         REMOTE SERVER
         (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.50.33)(PORT=1522)))       Instance "+asm2", status READY, has 2 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:0 refused:0 state:ready
         LOCAL SERVER
      "DEDICATED" established:0 refused:0 state:ready
         REMOTE SERVER
         (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.50.35)(PORT=1522)))
The command completed successfully

There is still a service handler for “+asm1” instance on the rebooted node pointing its interconnect IP where the ASM listener was running half an hour ago. So obviously the ASM listener redirects the connect request to that service handler which is not running any more. That explains the “No listener” error message in the log.

My workaround was to simply reload the ASM listener configuration on the surving node:

C:\>lsnrctl reload ASMNET1LSNR_ASM

LSNRCTL for 64-bit Windows: Version 12.1.0.2.0 - Production on 05-DEC-2017 14:34:04

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=ASMNET1LSNR_ASM)))
The command completed successfully

C:\Users\syszwiorap\Desktop>lsnrctl services ASMNET1LSNR_ASM

LSNRCTL for 64-bit Windows: Version 12.1.0.2.0 - Production on 05-DEC-2017 14:38:41

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=ASMNET1LSNR_ASM)))
Services Summary...
Service "+ASM" has 1 instance(s).
  Instance "+asm2", status READY, has 2 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:0 refused:0 state:ready
         REMOTE SERVER
         (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.50.35)(PORT=1522)))
      "DEDICATED" established:5 refused:0 state:ready
         LOCAL SERVER
The command completed successfully

And that was basically it. The “ora.storage” resource started right away.

I asume, this is an unexpected bahviour. The listener should drop the service handler after a specific timeout when it does not get service updates anymore to prevent such situations.

Advertisements

NLS_LANG, character sets & more – Part 2

Some days ago I wrote about NLS_LANG, character sets & more. Today I saw another neat effect. I was trying to do it right, but my messages from SQL*Plus still looked scrambled:

12-win-1252

The thing is, that the characters I need are simply not present in the default font. When I change the font, the same screen now looks somehow different:

13-win-1252

Maybe I should switch from German to English and ASCII 7 completely…

Oracle Grid Infrastructure 12.2 – HASMB

Oracle Grid Infrastructure used to have a feature to make a NFS share highly available. It simply uses an additional virtual IP address (VIP) and controls the NFS server to use this VIP. The client simply connects the NFS share using this VIP. So the NFS share is always available regardless on which node it currently running. This feature is restricted to ACFS, we can create NFS exports for an ACFS only which makes kind of sense. This whole thing is also known as HANFS.

New in Grid Infrastructure 12.2 is the ability to create highly available SMB shares, consequently called HASMB. Let’s see if and how this works. Some steps needs to be done as “root” while other steps can be done as “oracle”, see the command prompt to identify which user I actually used.

First, we need to install the samba server if not already done.

[root@vm151 ~]# yum install samba samba-winbind

Of couse this needs to be done on all servers in the cluster.

Now, we add a line to /etc/samba/smb.conf to allow guest access:

[global highlight="5"]
        workgroup = SAMBA
        security = user
        map to guest = Bad Password   

Finally, we start the samba server and enable it for autostart:

[root@vm151 ~]# systemctl start smb
[root@vm151 ~]# systemctl enable smb
Created symlink from /etc/systemd/system/multi-user.target.wants/smb.service to /usr/lib/systemd/system/smb.service.

For high availability we need to create a VIP:

[root@vm151 ~]# srvctl add havip -id mmiexport -address vm156.support.robotron.de

Next step is to create an ACFS filesystem that we will export later on:

[oracle@vm151 ~]$ asmcmd volcreate -G data -s 1G mmivolume
[oracle@vm151 ~]$ asmcmd volinfo -G data mmivolume
Diskgroup Name: DATA

         Volume Name: MMIVOLUME
         Volume Device: /dev/asm/mmivolume-1
         State: ENABLED
         Size (MB): 1024
         Resize Unit (MB): 64
         Redundancy: UNPROT
         Stripe Columns: 8
         Stripe Width (K): 1024
         Usage:
         Mountpath:

[oracle@vm151 ~]$ /sbin/mkfs -t acfs /dev/asm/mmivolume-1
mkfs.acfs: version                   = 12.2.0.1.0
mkfs.acfs: on-disk version           = 46.0
mkfs.acfs: volume                    = /dev/asm/mmivolume-1
mkfs.acfs: volume size               = 1073741824  (   1.00 GB )
mkfs.acfs: Format complete.

Ok, now we configure Grid Infrastructure resources for this filesystem:

[oracle@vm151 ~]$ mkdir -p /u01/app/oracle/acfs/mmivolume
[oracle@vm151 ~]$ ssh vm152 mkdir -p /u01/app/oracle/acfs/mmivolume



[root@vm151 ~]# srvctl add filesystem -volume mmivolume -diskgroup data -path /u01/app/oracle/acfs/mmivolume -user oracle
[root@vm151 ~]# srvctl start filesystem -volume mmivolume -diskgroup data
[root@vm151 ~]# srvctl status filesystem -volume mmivolume -diskgroup data
ACFS file system /u01/app/oracle/acfs/mmivolume is mounted on nodes vm151,vm152
[root@vm151 ~]# ll /u01/app/oracle/acfs/mmivolume
total 64
drwx------ 2 root root 65536 Jul 27 11:23 lost+found
[root@vm151 ~]# df -h /u01/app/oracle/acfs/mmivolume
Filesystem            Size  Used Avail Use% Mounted on
/dev/asm/mmivolume-1  1.0G  487M  538M  48% /u01/app/oracle/acfs/mmivolume

This is the moment when we can create the export:

[root@vm151 ~]# srvctl add exportfs -name mmiexportfs -path /u01/app/oracle/acfs/mmivolume -id mmiexport -type SMB
[oracle@vm151 ~]$ srvctl start havip -id mmiexport
[oracle@vm151 ~]$ srvctl status exportfs
export file system mmiexportfs is enabled
export file system mmiexportfs is exported on node vm151

Let’s see what it did to the samba config files:

[root@vm152 ~]# cat /etc/samba/smb.conf
...
# ACFS HASMB include
  include = /etc/samba/acfs/acfsinc.conf
...

[root@vm151 ~]# cat /etc/samba/acfs/acfsinc.conf
include=/etc/samba/acfs/mmiexportfs.conf

[root@vm151 ~]# cat /etc/samba/acfs/mmiexportfs.conf
[mmiexportfs]
path=/u01/app/oracle/acfs/mmivolume
read only = Yes
browsable = yes

It created an include for all the ACFS related things which in turn consists of additional includes for all the exports that we defined and that are currently active on the specific node.

So let’s try:

[oracle@vm152 ~]$ echo "Hallo" > /u01/app/oracle/acfs/mmivolume/mein.txt
C:\>dir \\vm156.support.robotron.de\mmiexportfs
Zugriff verweigert

Makes sense somehow. As we can see above, there is no option that allows guest access to the newly created share. So let’s add that:

[root@vm151 ~]# srvctl modify exportfs -name mmiexportfs -options "read only=no,browsable=yes,guest ok=yes,writeable=yes,public=yes"
[root@vm151 ~]# cat /etc/samba/acfs/mmiexportfs.conf
[mmiexportfs]
path=/u01/app/oracle/acfs/mmivolume
read only = Yes
browsable = yes

Ok, it does not apply the new options immediatly. It requires a restart:

[root@vm151 ~]# srvctl stop exportfs -name mmiexportfs -force
[root@vm151 ~]# srvctl start havip -id mmiexport -node vm151
[root@vm151 ~]# cat /etc/samba/acfs/mmiexportfs.conf
[mmiexportfs]
path=/u01/app/oracle/acfs/mmivolume
read only = no
browsable = yes
guest ok = yes
writeable = yes
public = yes

Let’s check again:

C:\>dir \\vm156.support.robotron.de\mmiexportfs
 Datenträger in Laufwerk \\vm156.support.robotron.de\mmiexportfs: ist mmiexportfs
 Volumeseriennummer: 6691-6876

 Verzeichnis von \\vm156.support.robotron.de\mmiexportfs

27.07.2017  12:51    <DIR>          .
27.07.2017  11:16    <DIR>          ..
27.07.2017  11:23    <DIR>          lost+found
27.07.2017  12:45                 6 mein.txt
               2 Datei(en),             15 Bytes
               3 Verzeichnis(se),    496.132.096 Bytes frei

C:\>type \\vm156.support.robotron.de\mmiexportfs\mein.txt
Hallo

How about creating files from the Windows end of the line?

C:\>echo Cheers > \\vm156.support.robotron.de\mmiexportfs\windows.txt
Zugriff verweigert

The reason for this are the filesystem permissions on the Linux side, we need to change these:

[root@vm152 ~]# ls -la /u01/app/oracle/acfs/mmivolume
total 100
drwxrwxr-x 4 oracle oinstall 32768 Jul 27 12:45 .
drwxr-xr-x 3 oracle oinstall    22 Jul 27 11:16 ..
drwx------ 2 root   root     65536 Jul 27 11:23 lost+found
-rw-r--r-- 1 oracle oinstall     6 Jul 27 12:45 mein.txt
[root@vm152 ~]# chmod 777 /u01/app/oracle/acfs/mmivolume
[root@vm152 ~]# ls -la /u01/app/oracle/acfs/mmivolume
total 100
drwxrwxrwx 4 oracle oinstall 32768 Jul 27 12:45 .
drwxr-xr-x 3 oracle oinstall    22 Jul 27 11:16 ..
drwx------ 2 root   root     65536 Jul 27 11:23 lost+found
-rw-r--r-- 1 oracle oinstall     6 Jul 27 12:45 mein.txt

And voila, now it works:

C:\>echo Cheers > \\vm156.support.robotron.de\mmiexportfs\windows.txt

C:\>type \\vm156.support.robotron.de\mmiexportfs\windows.txt
Cheers

And from the other end too:

[oracle@vm151 ~]# cat /u01/app/oracle/acfs/mmivolume/windows.txt
Cheers

The last step is to try a relocate of the export:

[oracle@vm152 ~]$ srvctl status exportfs -id mmiexport
export file system mmiexportfs is enabled
export file system mmiexportfs is exported on node vm151
[oracle@vm152 ~]$ srvctl relocate havip -id mmiexport -force
HAVIP was relocated successfully
[oracle@vm152 ~]$ srvctl status exportfs -id mmiexport
export file system mmiexportfs is enabled
export file system mmiexportfs is exported on node vm152

During the relocate operation, which is pretty fast by the way, I tried a “dir” opertion from my windows client. It simply “hangs” for a couple of seconds and returns results as soon as the share becomes available again. Pretty straight forward.

So all in all this is a nice expansion to include the Windows world without having to use NFS on Windows.
Of cause my example is not very bullet-proof in terms of security. This can (and should) be changed for production purposes.

Patching an OMS 13.2 on Windows

Yesterday I had to patch an Enterprise Manager 13.2 that is running on a Windows box. It was never patched before so it was clear, that it would require some activities beforehand. But there were some other unforeseen things which I will describe in this post. Sorry for the german output in some of the snippets, it was a german system….

First, I needed to update “OPatch” and “OMSPatcher”. Refreshing the latter one was easy since I just needed to replace the “OMSPatcher” directory in the Middleware Home with the new one. For refreshing “OPatch” it used to be the same procedure, but I learned that it has changed. Reading My Oracle Support docs is really helpful sometimes. Here is how it works:

D:\CloudControl_cc13r2\Update\opatch_13.9.1.3.0\6880880>D:\oracle\product\mw13cR2\oracle_common\jdk\bin\java.exe -jar .\opatch_generic.jar -silent ORACLE_HOME=%ORACLE_HOME%
Launcher-Logdatei ist C:\Users\XXX\AppData\Local\Temp\2\OraInstall2017-04-06_11-15-14AM\launcher2017-04-06_11-15-14AM.log.
Installationsprogramm wird extrahiert... . . Fertig
Es wird geprüft, ob CPU-Geschwindigkeit über 300 MHz liegt   Tatsächlich 3500    Erfolgreich
Swap-Bereich wird geprüft: muss größer sein als 512 MB    Erfolgreich
Es wird geprüft, ob diese Plattform eine 64-Bit-JVM erfordert   Tatsächlich 64    Erfolgreich (64-Bit nicht erforderlich)
Temporärer Speicherplatz wird geprüft: muss größer sein als 300 MB   Tatsächlich 46996 MB    Erfolgreich


Vorbereitung für das Starten von Oracle Universal Installer aus C:\Users\XXX\AppData\Local\Temp\2\OraInstall2017-04-06_11-15-14AM
Installationszusammenfassung


Speicherplatz: erforderlich 27 MB, verf³gbar 397.477 MB
Zu installierende Featuresets:
        Next Generation Install Core 13.9.1.0.1
        OPatch 13.9.1.3.0
        OPatch Auto OPlan 13.9.1.0.0
Sessionlogdatei ist C:\Users\XXX\AppData\Local\Temp\2\OraInstall2017-04-06_11-15-14AM\install2017-04-06_11-15-14AM.log

Die Produktliste wird geladen. Warten.
 1%

[...]

Die Logs finden Sie hier: C:\Users\XXX\AppData\Local\Temp\2\OraInstall2017-04-06_11-15-14AM.

Drücken Sie zum Beenden die Eingabetaste

Now I wanted to apply the OMS side patch using omspatchter:

D:\CloudControl_cc13r2\Update\25501489>omspatcher apply 
OMSPatcher Automation Tool
Copyright (c) 2017, Oracle Corporation.  All rights reserved.


OMSPatcher version : 13.8.0.0.2
OUI version        : 13.9.1.0.0
Running from       : d:\oracle\product\mw13cR2
Log file location  : d:\oracle\product\mw13cR2\cfgtoollogs\omspatcher\opatch2017-04-06_11-22-19AM_1.log

OMSPatcher log file: d:\oracle\product\mw13cR2\cfgtoollogs\omspatcher\25501489\omspatcher_2017-04-06_11-22-22AM_deploy.log

Please enter OMS weblogic admin server URL(t3s://omshost.acme.com:7101):>
Please enter OMS weblogic admin server username(weblogic):>
Please enter OMS weblogic admin server password:>


OMSPatcher could not read installed OMS owner from OUI inventory by itself.
Please add OMSPatcher.OMS_USER=<OMS installed user> to command line and try again.


[ Error during Get Central Inventory Information Phase]. Detail: OMSPatcher was not able to read OUI inventory to retrieve installed user & system details.
OMSPatcher failed: OMSPatcher could not read installed OMS owner from OUI inventory by itself.
Please add OMSPatcher.OMS_USER=<OMS installed user> to command line and try again.

Log file location: d:\oracle\product\mw13cR2\cfgtoollogs\omspatcher\25501489\omspatcher_2017-04-06_11-22-22AM_deploy.log

Recommended actions: Please check if OUI inventory is locked by some other processes. Please check if OUI inventory is readable.

OMSPatcher failed with error code = 236

…and failed. But the required action is stated quite clearly, so I added the owner of the OMS home:

D:\CloudControl_cc13r2\Update\25501489>omspatcher apply OMSPatcher.OMS_USER=adkaiser
OMSPatcher Automation Tool
Copyright (c) 2017, Oracle Corporation.  All rights reserved.


OMSPatcher version : 13.8.0.0.2
OUI version        : 13.9.1.0.0
Running from       : d:\oracle\product\mw13cR2
Log file location  : d:\oracle\product\mw13cR2\cfgtoollogs\omspatcher\opatch2017-04-06_11-25-23AM_1.log

OMSPatcher log file: d:\oracle\product\mw13cR2\cfgtoollogs\omspatcher\25501489\omspatcher_2017-04-06_11-25-29AM_deploy.log

Please enter OMS weblogic admin server URL(t3s://omshost.acme.com:7101):>
Please enter OMS weblogic admin server username(weblogic):>
Please enter OMS weblogic admin server password:>


OMSPatcher could not read installed OMS owner from OUI inventory by itself.
Please add OMSPatcher.OMS_USER=<OMS installed user> to command line and try again.


[ Error during Get Central Inventory Information Phase]. Detail: OMSPatcher was not able to read OUI inventory to retrieve installed user & system details.
OMSPatcher failed: OMSPatcher could not read installed OMS owner from OUI inventory by itself.
Please add OMSPatcher.OMS_USER=<OMS installed user> to command line and try again.

Log file location: d:\oracle\product\mw13cR2\cfgtoollogs\omspatcher\25501489\omspatcher_2017-04-06_11-25-29AM_deploy.log

Recommended actions: Please check if OUI inventory is locked by some other processes. Please check if OUI inventory is readable.

OMSPatcher failed with error code = 236

The logfile is not very helpful either:

D:\CloudControl_cc13r2\Update\25501489>type d:\oracle\product\mw13cR2\cfgtoollogs\omspatcher\25501489\omspatcher_2017-04-06_11-25-29AM_deploy.log
[06.04.2017 11:25:29]        OMSPatcher has successfully verified min_patching_tool_version check
[06.04.2017 11:25:47]        [ Error during Get Central Inventory Information phase ] Detail:                              OMSPatcher was not able to read OUI inventory to retrieve installed user & system details.
OMSPatcher could not read installed OMS owner from OUI inventory by itself.
                             Please add OMSPatcher.OMS_USER=<OMS installed user> to command line and try again.

So I started researching and found EM13c: OMSPatcher Analyze Command For 13c OMS Fails With Error “OMSPatcher failed with error code 236” (Doc ID 2136840.1) which says that the parameter needs to be specified inside a properties file. There is another doc EM 13c: How to Apply a Patch to the Enterprise Manager 13c Cloud Control OMS Oracle Home (Doc ID 2091619.1) which describes how to create this properties file.

First step is to create the WebLogic encrypted configuration and key files.

D:\CloudControl_cc13r2\Update>%ORACLE_HOME%\OMSpatcher\wlskeys\createkeys.cmd -oh %ORACLE_HOME% -location D:\CloudControl_cc13r2\Update

[...]

Your environment has been set.
Please enter weblogic admin server username::> weblogic
Please enter weblogic admin server password::>
CreateKeys Weblogic API executed successfully.

User configuration file created: D:\CloudControl_cc13r2\Update\config
User key file created: D:\CloudControl_cc13r2\Update\key
'createkeys.bat' succeeded.

These two lines can I now use in my properties file along with the OMS owner. This is the content of my properties file:

AdminServerURL=t3s://omshost.acme.com:7101
AdminConfigFile=D:\\CloudControl_cc13r2\\Update\\config
AdminKeyFile=D:\\CloudControl_cc13r2\\Update\\key
OPatchAuto.OMS_USER=administrator

Be aware of the double backslashes, otherwise it won’t work.

Now finally I am able to patch my OMS using this properties file.

D:\CloudControl_cc13r2\Update\25501489>omspatcher apply -property_file D:\CloudControl_cc13r2\Update\properties.txt
OMSPatcher Automation Tool
Copyright (c) 2017, Oracle Corporation.  All rights reserved.


OMSPatcher version : 13.8.0.0.2
OUI version        : 13.9.1.0.0
Running from       : d:\oracle\product\mw13cR2
Log file location  : d:\oracle\product\mw13cR2\cfgtoollogs\omspatcher\opatch2017-04-06_13-08-43PM_1.log

OMSPatcher log file: d:\oracle\product\mw13cR2\cfgtoollogs\omspatcher\25501489\omspatcher_2017-04-06_13-08-53PM_analyze.log



WARNING: Could not apply the patch "25414328" because the "oracle.sysman.vi.oms.plugin with version 13.2.1.0.0" core component of the OMS or the plug-in for which the patch is intended is either not deployed or deployed with another version in your Enterprise Manager system.
WARNING: Could not apply the patch "25414306" because the "oracle.sysman.emfa.oms.plugin with version 13.2.1.0.0" core component of the OMS or the plug-in for which the patch is intended is either not deployed or deployed with another version in your Enterprise Manager system.
WARNING: Could not apply the patch "25118889" because the "oracle.sysman.vt.oms.plugin with version 13.2.1.0.0" core component of the OMS or the plug-in for which the patch is intended is either not deployed or deployed with another version in your Enterprise Manager system.
WARNING: Could not apply the patch "25414263" because the "oracle.sysman.csm.oms.plugin with version 13.2.1.0.0" core component of the OMS or the plug-in for which the patch is intended is either not deployed or deployed with another version in your Enterprise Manager system.
WARNING: Could not apply the patch "25414255" because the "oracle.sysman.ssa.oms.plugin with version 13.2.1.0.0" core component of the OMS or the plug-in for which the patch is intended is either not deployed or deployed with another version in your Enterprise Manager system.
WARNING: Could not apply the patch "25414356" because the "oracle.sysman.smf.oms.plugin with version 13.2.1.0.0" core component of the OMS or the plug-in for which the patch is intended is either not deployed or deployed with another version in your Enterprise Manager system.

Configuration Validation: Success


Running apply prerequisite checks for sub-patch(es) "25414294,25414339,25414245,25414317,25414281" and Oracle Home "d:\oracle\product\mw13cR2"...
Sub-patch(es) "25414294,25414339,25414245,25414317,25414281" are successfully analyzed for Oracle Home "d:\oracle\product\mw13cR2"

[...]

ORA-28545 when querying HS gateway in Oracle 12c

Since our customers are forced to move to Oracle Database 12.1, I am in the role to support their upgrades. Recently I upgraded a Windows environment that used the Heterogeneous Services to connect to an external ODBC datasource via a database link. The configuration is a little tricky, but pretty straight forward once you get the idea.

First, there must be an ODBC datasource. That is a simple System DSN which was already in place.

Second, you have to choose a kind of “virtual” SID for your connection to this ODBC datasource.

Third, create an init.ora in the $ORACLE_HOME/hs/admin directory and put at least one parameter into it:

HS_FDS_CONNECT_INFO="<Name of ODBC System DSN>"

Fourth, add static SID to the listener  by modifying the listener.ora file:

SID_LIST_LISTENER_STD =
(SID_LIST =
(SID_DESC=
(SID_NAME=<virtual SID>)
(ORACLE_HOME=c:\oracle\product\12.1.0.2\dbprod)
(PROGRAM=dg4odbc)
)
)

Then reload the listener configuration to make it active.

Fifth, create a tnsnames.ora entry that points to the listener and the virtual SID. Make sure, you specify “HS=OK”.

MYODBCDS =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = tcp)(HOST = <listener host>)(PORT = 1521))
(CONNECT_DATA =
(SID = <virtual SID>)
)
(HS = OK)
)

Sixth and finally, create a database link that points to the tnsnames entry:

SQL> create database link myodbcds connect to <user> identified by <pwd> using "MYODBCDS";

Database link created.

All those steps were done years ago in the old 11.2 environment. Everything worked fine there. So I simply copied all the configuration details to the new 12.1 Oracle home. Then I moved the listener to the new 12.1 home and upgraded the database. But when testing the HS connection, the following happened:

SQL*Plus: Release 12.1.0.2.0 Production on Tue Mar 28 11:30:45 2017

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Last Successful login time: Tue Mar 28 2017 11:06:02 +02:00

Connected to:
Oracle Database 12c Standard Edition Release 12.1.0.2.0 - 64bit Production

SQL> select * from dual@myodbcds ;
select * from dual@myodbcds
*
ERROR at line 1:
ORA-28545: error diagnosed by Net8 when connecting to an agent
Unable to retrieve text of NETWORK/NCR message 65535
ORA-02063: preceding 2 lines from MYODBCDS

I checked the listener services, the connection seemed to be established successfully.

C:\>lsnrctl services listener

LSNRCTL for 64-bit Windows: Version 12.1.0.2.0 - Production on 28-MAR-2017 11:39:05

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=<hostname>)(PORT=1521)))
Services Summary...
Service "<virtual SID>" has 1 instance(s).
Instance "<virtual SID>", status UNKNOWN, has 1 handler(s) for this service...
Handler(s):
"DEDICATED" established:1 refused:0
LOCAL SERVER
The command completed successfully

So I created a new listener in the old 11.2 home that used another port to test things. The 11.2 listener worked fine. So what is the difference between both listeners?
After a little research I stumbled upon the Windows services. The database service was not running as LocalSystem but as a domain user. This is because the database writes it’s backups to an UNC share. Both, 11.2 and 12.1, listeners used the LocalSystem account. So I changed the 12.1 listener to use the domain account too and this solved the problem.

SQL*Plus: Release 12.1.0.2.0 Production on Tue Mar 28 11:39:10 2017

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Last Successful login time: Tue Mar 28 2017 11:36:53 +02:00

Connected to:
Oracle Database 12c Standard Edition Release 12.1.0.2.0 - 64bit Production

SQL> select * from dual@myodbcds;

D
-
X

I have no idea why the 11.2 listener can run as LocalSystem whilst the 12.1 listener cannot. But generally I think it is a good practice to run listener and database services with the same user credentials.

Repeating CRS-8503 in Grid Infrastructure Log

During my odyssey of applying the current Bundlepatch to a 12.1.0.2 Grid Infrastructure installation on Windows I stumbled upon a repeating error in the cluster alert.log.

PS C:\> get-content %ORACLE_BASE%/diag/crs/<hostname>/crs/trace/alert.log -tail 4 -wait
Tue Dec 15 02:28:58 2015
Errors in file D:\app\oracle\diag\crs\<hostname>\crs\trace\ocrcheck_8216.trc  (incident=81):
CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []
Incident details in: D:\app\oracle\diag\crs\<hostname>\crs\incident\incdir_81\ocrcheck_8216_i81.trc

This error occurs over and over again with an interval of several hours. The My Oracle Support research revealed Doc ID 1997476.1 which can be worked around easily by setting the “asm_diskstring” parameter explicitly.

D:\>set ORACLE_SID=+ASM1

D:\>sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Tue Dec 15 09:07:53 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> show parameter asm

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups                       string      OCR, GIMGMT, DATA, FRA, SSD
asm_diskstring                       string
asm_power_limit                      integer     1
asm_preferred_read_failure_groups    string
SQL> alter system set asm_diskstring='\\.\ORCLDISK*' scope=both sid='*';

System altered.

SQL> show parameter asm

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups                       string      OCR, GIMGMT, DATA, FRA, SSD
asm_diskstring                       string      \\.\ORCLDISK*
asm_power_limit                      integer     1
asm_preferred_read_failure_groups    string

After that the errors did not occur anymore in the alert.log. And that’s it finally. The cluster may now go live… May the force be with me 🙂

Network not started after applying Bundlepatch to Grid Infrastructure

In the previous post I described how to cope with issues before start applying the bundlepatch. Now this post is about an issue I run into after applying this bundlepatch.
During execution of “rootcrs.pl -postpatch” there were errors starting several resources. All of these issues were caused by the network resource which could not be started after the patch. The output, or clusterware alert.log, showed something like this:

2015-12-15 11:24:10.375 [ORAROOTAGENT(7672)]CRS-5017: The resource action "ora.net1.network start" encountered the following error:
2015-12-15 11:24:10.375+CRS-5008: Invalid attribute value: <Network name> for the network interface. For details refer to "(:CLSN00107:)" in "D:\app\oracle\diag\crs\<hostname>\crs\trace\crsd_orarootagent_system.trc".

Nobody changed the network interface names, no one modified the clusters network configuration. So what?
After a couple of hours of rebooting, investigating and tearing hair I stumbled upon a MOS note Windows: gpnpd fails to start as “oifcfg iflist -p -n” does not show any network interface (Doc ID 1967213.1) which included the final hint. In my case the GPNPD was up by the way. But the GI was no longer able to read the network configuration:

D:\>oifcfg iflist

D:\>

Not a single interface available. As stated in the MOS note, the “wimgmt” service was not running. I disabled that service because it was using some of GI’s files and thus I could not run “opatch apply -local”. My mistake…. Re-enabling the service fixed this issue. Everything runs fine again.