Repeating CRS-8503 in Grid Infrastructure Log

During my odyssey of applying the current Bundlepatch to a 12.1.0.2 Grid Infrastructure installation on Windows I stumbled upon a repeating error in the cluster alert.log.

PS C:\> get-content %ORACLE_BASE%/diag/crs/<hostname>/crs/trace/alert.log -tail 4 -wait
Tue Dec 15 02:28:58 2015
Errors in file D:\app\oracle\diag\crs\<hostname>\crs\trace\ocrcheck_8216.trc  (incident=81):
CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []
Incident details in: D:\app\oracle\diag\crs\<hostname>\crs\incident\incdir_81\ocrcheck_8216_i81.trc

This error occurs over and over again with an interval of several hours. The My Oracle Support research revealed Doc ID 1997476.1 which can be worked around easily by setting the “asm_diskstring” parameter explicitly.

D:\>set ORACLE_SID=+ASM1

D:\>sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Tue Dec 15 09:07:53 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> show parameter asm

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups                       string      OCR, GIMGMT, DATA, FRA, SSD
asm_diskstring                       string
asm_power_limit                      integer     1
asm_preferred_read_failure_groups    string
SQL> alter system set asm_diskstring='\\.\ORCLDISK*' scope=both sid='*';

System altered.

SQL> show parameter asm

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups                       string      OCR, GIMGMT, DATA, FRA, SSD
asm_diskstring                       string      \\.\ORCLDISK*
asm_power_limit                      integer     1
asm_preferred_read_failure_groups    string

After that the errors did not occur anymore in the alert.log. And that’s it finally. The cluster may now go live… May the force be with me šŸ™‚

Network not started after applying Bundlepatch to Grid Infrastructure

InĀ the previous postĀ I described how to cope with issues before start applying the bundlepatch. Now this post is about an issue I run into after applying this bundlepatch.
During execution of “rootcrs.pl -postpatch” there were errors starting several resources. All of these issues were caused by the network resource which could not be started after the patch. The output, or clusterware alert.log, showed something like this:

2015-12-15 11:24:10.375 [ORAROOTAGENT(7672)]CRS-5017: The resource action "ora.net1.network start" encountered the following error:
2015-12-15 11:24:10.375+CRS-5008: Invalid attribute value: <Network name> for the network interface. For details refer to "(:CLSN00107:)" in "D:\app\oracle\diag\crs\<hostname>\crs\trace\crsd_orarootagent_system.trc".

Nobody changed the network interface names, no one modified the clusters network configuration. So what?
After a couple of hours of rebooting, investigating and tearing hair I stumbled upon a MOS note Windows: gpnpd fails to start as “oifcfg iflist -p -n” does not show any network interface (Doc ID 1967213.1) which included the final hint. In my case the GPNPD was up by the way. But the GI was no longer able to read the network configuration:

D:\>oifcfg iflist

D:\>

Not a single interface available. As stated in the MOS note, the “wimgmt” service was not running. I disabled that service because it was using some of GI’s files and thus I could not run “opatch apply -local”. My mistake…. Re-enabling the service fixed this issue. Everything runs fine again.

“rootcrs.pl -prepatch” fails when applying Windows Bundlepatch to GI

When applying Bundlepatches to a Grid Infrastructure installation on Windows, one needs to execute the “rootcrs.pl -prepatch” srcipt in order to stop and unlock the clusterware stack. Sometimes this script ends up with the following error:

C:\Windows\system32>%ORACLE_HOME%\perl\bin\perl -I%ORACLE_HOME%\perl\lib -I%ORACLE_HOME%\crs\install %ORACLE_HOME%\crs\install\rootcrs.pl -prepatch
Using configuration parameter file: D:\app\grid\12.1.0.2\crs\install\crsconfig_params
2015/12/14 13:39:12 CLSRSC-546: The Grid Infrastructure processes identified by the following PIDs (9596) are still active.

Died at d:\app\grid\12.1.0.2\crs\install/crspatch.pm line 732.

This in turns prevents the patch procedure to complete successfully because the “rootcrs.pl -postpatch” execution will find out that the unlock did not complete without error.
Obviously one should only proceed when the prepatch completed without any issues. Looking for the PID mentioned in the error message using either Windows Taskmanager or ProcessExplorer did not return any result. There is no process with that PID. It is kind of a orphaned process. Let’s find out where the PID comes from by looking into the logfile under

%ORACLE_HOME%\cfgtoollogs\crsconfig\crspatch_<servername>_<timestamp>.log

Search the file for the PID showes something like this:

2015-12-14 13:33:21: PID in <D:\app\oracle\crsdata\<servername>\output\crsd_scriptagent_system.pid> is: 9596
2015-12-14 13:33:21: The GI process identified by PID<9596> is running

Now one can replace the PID in that file, 9596 in this case, with a simple 0. Do leave the *.pid file empty since this would lead to another error:

C:\Windows\system32>%ORACLE_HOME%\perl\bin\perl -I%ORACLE_HOME%\perl\lib -I%ORACLE_HOME%\crs\install %ORACLE_HOME%\crs\install\rootcrs.pl -prepatch
Using configuration parameter file: D:\app\grid\12.1.0.2\crs\install\crsconfig_params
Can't kill a non-numeric process ID at d:\app\grid\12.1.0.2\crs\install/crsutils.pm line 12324.

Thrust me, use a 0 and the “rootcrs.pl -prepatch” will complete successfully.
There are a some more things I run into which I will explain in upcoming blog posts.
So long, thanks for reading.

ODA: Performance Impact due to Flashback Database

Here and there I am planning and actually doing migrations to the Oracle Database Appliance (ODA). I like this appliance because of it’s relative simplicity and ease of management. In most cases we connect two ODAs in different locations using DataGuard with the goal of maximizing availability. That means we have a primary 2-node-RAC on ODA-1 and another 2-node-RAC on ODA-2 which operates as standby.
In order to fulfill the requirements of availability and fault tolerance the databases are configured as “Maximum Availability”:

  • sufficient Standby Redologs configured – on the dedicated Redo SSDs
  • Force Logging mode
  • Flashback Database configured and activated – located in FRA per definition

In terms of availability and manageability this is quite good, but in terms of performance this turned out to be really bad sometimes. Look at this AWR snippets:

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:     88808 01-Sep-15 17:30:06       480       2.7
  End Snap:     88855 02-Sep-15 17:00:10       478       6.8
   Elapsed:            1,410.06 (mins)
   DB Time:            3,311.41 (mins)
Wait Classes by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                        Avg             Avg
                                        Total Wait     Wait   % DB   Active
Wait Class                  Waits       Time (sec)     (ms)   time Sessions
---------------- ---------------- ---------------- -------- ------ --------
System I/O             11,522,327          161,304       14   81.2      1.9
DB CPU                                     101,562            51.1      1.2
User I/O               19,956,840           88,663        4   44.6      1.0
Background Wait Events                 DB/Inst: XXX/xxx  Snaps: 88808-88855
-> ordered by wait time desc, waits desc (idle events last)
-> Only events with Total Wait Time (s) >= .001 are shown
-> %Timeouts: value of 0 indicates value was > .5%.  Value of null is truly 0

                                                             Avg                
                                        %Time Total Wait    wait    Waits   % bg
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
db file async I/O submit        969,898     0     90,151      93      0.3   46.5
db file parallel write        5,015,605     0     33,520       7      1.3   17.3
flashback log file write        617,118     0     30,295      49      0.2   15.6
Backup: MML write backup p   10,608,828     0     11,727       1      2.7    6.0
RMAN backup &amp; recovery I/O      288,934     0      3,508      12      0.1    1.8

Remember, the flashback logs are stored inside FRA. And where does ODA place the FRA? On the slower part of the spinning disks. The system can not be faster than the slowest part in the chain. And in this configuration the user I/Os are influenced by flashback I/Os making things even worse.
So turn off flashback database in order to improve performance.

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:     89528 16-Sep-15 17:30:45       393       5.0
  End Snap:     89575 17-Sep-15 17:00:44       576       5.2
   Elapsed:            1,409.98 (mins)
   DB Time:            2,792.69 (mins)
Wait Classes by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                        Avg             Avg
                                        Total Wait     Wait   % DB   Active
Wait Class                  Waits       Time (sec)     (ms)   time Sessions
---------------- ---------------- ---------------- -------- ------ --------
System I/O              7,424,187           97,029       13   57.9      1.1
DB CPU                                      96,595            57.6      1.1
User I/O                8,191,268           63,769        8   38.1      0.8

Only half the System I/O than with flashback database enabled.

Background Wait Events                 DB/Inst: XXX/xxx  Snaps: 89528-89575
-> ordered by wait time desc, waits desc (idle events last)
-> Only events with Total Wait Time (s) >= .001 are shown
-> %Timeouts: value of 0 indicates value was > .5%.  Value of null is truly 0

                                                             Avg                
                                        %Time Total Wait    wait    Waits   % bg
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
db file async I/O submit        531,487     0     73,471     138      0.1   57.5
db file parallel write        2,384,365     0     15,169       6      0.7   11.9
Backup: MML write backup p   10,121,315     0     11,769       1      2.8    9.2
RMAN backup &amp; recovery I/O      197,080     0      4,906      25      0.1    3.8

The background wait for “flashback log file write” completely gone.

Keep that in mind when using Flashback Database. This will also apply to other hardware than ODA. Be careful which disks you use for FRA/flashback logs.

tnsnames.ora Secrets

I learn something new every day. The todays lession I want to share with you.
When reading the headline, I assume you wonder what secrets could be hidden inside tnsnames.ora file. It is nothing magic, maybe some magic when HA features like multiple addresses, description lists or backup aliases for preconnect come into play. But that is not the topic of this post.
You all know the entries in tnsnames.ora which look like this:

db12c =
  (DESCRIPTION =
    (ADDRESS =
      (PROTOCOL = TCP)(HOST = 192.168.56.101)(PORT = 1521)
    )
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = db12c)
    )
  )

It works as espected:

C:\Users\marco.mischke> tnsping db12c

TNS Ping Utility for 64-bit Windows: Version 11.2.0.4.0 - Production on 02-DEC-2015 20:06:46

Copyright (c) 1997, 2013, Oracle. All rights reserved.

Used parameter files:
C:\oracle\product\11.2.0.4\client\network\admin\sqlnet.ora

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.101)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = db12c)))
OK (10 msec)

But did you know, that you can define multiple aliases for the same description? I didn’t.

myalias1, myalias2, whatever, db12c =
  (DESCRIPTION =
    (ADDRESS =
      (PROTOCOL = TCP)(HOST = 192.168.56.101)(PORT = 1521)
    )
    (CONNECT_DATA =
       (SERVER = DEDICATED)
       (SERVICE_NAME = db12c)
    )
  )

Now you can use all the aliases:

C:\Users\marco.mischke> tnsping db12c

TNS Ping Utility for 64-bit Windows: Version 11.2.0.4.0 - Production on 02-DEC-2015 20:10:51

Copyright (c) 1997, 2013, Oracle. All rights reserved.

Used parameter files:
C:\oracle\product\11.2.0.4\client\network\admin\sqlnet.ora

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.101)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = db12c)))
OK (10 msec)
C:\Users\marco.mischke> tnsping myalias1

TNS Ping Utility for 64-bit Windows: Version 11.2.0.4.0 - Production on 02-DEC-2015 20:10:56

Copyright (c) 1997, 2013, Oracle. All rights reserved.

Used parameter files:
C:\oracle\product\11.2.0.4\client\network\admin\sqlnet.ora

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.101)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = db12c)))
OK (10 msec)
C:\Users\marco.mischke> tnsping myalias2

TNS Ping Utility for 64-bit Windows: Version 11.2.0.4.0 - Production on 02-DEC-2015 20:10:57

Copyright (c) 1997, 2013, Oracle. All rights reserved.

Used parameter files:
C:\oracle\product\11.2.0.4\client\network\admin\sqlnet.ora

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.101)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = db12c)))
OK (10 msec)
C:\Users\marco.mischke> tnsping whatever

TNS Ping Utility for 64-bit Windows: Version 11.2.0.4.0 - Production on 02-DEC-2015 20:11:05

Copyright (c) 1997, 2013, Oracle. All rights reserved.

Used parameter files:
C:\oracle\product\11.2.0.4\client\network\admin\sqlnet.ora

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.101)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = db12c)))
OK (10 msec)

A quick check in theĀ Net Services Reference returned no further information about that. If you have information about this, please let me know.
Otherwise, maybe that helps some of you to consolidate their tnsnames.ora file. You no longer need to maintain several aliases with the same description pointing to a single database.

Environment Variables, SQL*Net and Windows

From time to time I stumble upon questions regarding environmentĀ settings for Oracle installations on Windows, either Server or Client. The question are always similar. What variables are set? Where are they set? What about registry entries? How do I get the values in use? This post will hopefully clarify some things about that.

Where do I set environment variables?

There are several places to define environment variables. This is the list of possibilities in order of precedence.

  1. set directly in the current shell (CMD, set command)
  2. user variable set in System properties
  3. system variable set in System properties
  4. Registry value in HKLM\software\oracle\<Home name>

Which SQL*Net configuration files are used?

The search order for SQL*Net files like sqlnet.ora and tnsnames.ora is like this:

  1. Current directory
  2. Path defined by TNS_ADMIN variable (see above for specifying variables)
  3. %ORACLE_HOME%\network\admin

Which values are actually in use?

Since the “set” command displays only variables and values that are defined as environment variables and ignores any values defined inside the Registry, we need another way to get these values. We can use SQL*Plus for this:

C:\Users\marco.mischke>set NLS_LANG=american_america.we8pc850

C:\Users\marco.mischke>sqlplus /nolog

SQL*Plus: Release 11.2.0.4.0 Production on Tue Dec 1 10:40:56 2015

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

SQL> @[%TNS_ADMIN%]
SP2-0310: unable to open file "[from registry].sql"
SQL> @[%NLS_LANG%]
SP2-0310: unable to open file "[american_america.we8pc850]"
SQL> @[%NLS_DATE_FORMAT%]
SP2-0310: unable to open file "[%NLS_DATE_FORMAT%]"

You see, SQL*Plus expands the variables to the final value, even if the value comes from the Registry. Wben there is no value anywhere defined, it simply displays the variable name.