Patching an OMS 13.2 on Windows

Yesterday I had to patch an Enterprise Manager 13.2 that is running on a Windows box. It was never patched before so it was clear, that it would require some activities beforehand. But there were some other unforeseen things which I will describe in this post. Sorry for the german output in some of the snippets, it was a german system….

First, I needed to update “OPatch” and “OMSPatcher”. Refreshing the latter one was easy since I just needed to replace the “OMSPatcher” directory in the Middleware Home with the new one. For refreshing “OPatch” it used to be the same procedure, but I learned that it has changed. Reading My Oracle Support docs is really helpful sometimes. Here is how it works:

D:\CloudControl_cc13r2\Update\opatch_13.9.1.3.0\6880880>D:\oracle\product\mw13cR2\oracle_common\jdk\bin\java.exe -jar .\opatch_generic.jar -silent ORACLE_HOME=%ORACLE_HOME%
Launcher-Logdatei ist C:\Users\XXX\AppData\Local\Temp\2\OraInstall2017-04-06_11-15-14AM\launcher2017-04-06_11-15-14AM.log.
Installationsprogramm wird extrahiert... . . Fertig
Es wird geprüft, ob CPU-Geschwindigkeit über 300 MHz liegt   Tatsächlich 3500    Erfolgreich
Swap-Bereich wird geprüft: muss größer sein als 512 MB    Erfolgreich
Es wird geprüft, ob diese Plattform eine 64-Bit-JVM erfordert   Tatsächlich 64    Erfolgreich (64-Bit nicht erforderlich)
Temporärer Speicherplatz wird geprüft: muss größer sein als 300 MB   Tatsächlich 46996 MB    Erfolgreich


Vorbereitung für das Starten von Oracle Universal Installer aus C:\Users\XXX\AppData\Local\Temp\2\OraInstall2017-04-06_11-15-14AM
Installationszusammenfassung


Speicherplatz: erforderlich 27 MB, verf³gbar 397.477 MB
Zu installierende Featuresets:
        Next Generation Install Core 13.9.1.0.1
        OPatch 13.9.1.3.0
        OPatch Auto OPlan 13.9.1.0.0
Sessionlogdatei ist C:\Users\XXX\AppData\Local\Temp\2\OraInstall2017-04-06_11-15-14AM\install2017-04-06_11-15-14AM.log

Die Produktliste wird geladen. Warten.
 1%

[...]

Die Logs finden Sie hier: C:\Users\XXX\AppData\Local\Temp\2\OraInstall2017-04-06_11-15-14AM.

Drücken Sie zum Beenden die Eingabetaste

Now I wanted to apply the OMS side patch using omspatchter:

D:\CloudControl_cc13r2\Update\25501489>omspatcher apply 
OMSPatcher Automation Tool
Copyright (c) 2017, Oracle Corporation.  All rights reserved.


OMSPatcher version : 13.8.0.0.2
OUI version        : 13.9.1.0.0
Running from       : d:\oracle\product\mw13cR2
Log file location  : d:\oracle\product\mw13cR2\cfgtoollogs\omspatcher\opatch2017-04-06_11-22-19AM_1.log

OMSPatcher log file: d:\oracle\product\mw13cR2\cfgtoollogs\omspatcher\25501489\omspatcher_2017-04-06_11-22-22AM_deploy.log

Please enter OMS weblogic admin server URL(t3s://omshost.acme.com:7101):>
Please enter OMS weblogic admin server username(weblogic):>
Please enter OMS weblogic admin server password:>


OMSPatcher could not read installed OMS owner from OUI inventory by itself.
Please add OMSPatcher.OMS_USER=<OMS installed user> to command line and try again.


[ Error during Get Central Inventory Information Phase]. Detail: OMSPatcher was not able to read OUI inventory to retrieve installed user & system details.
OMSPatcher failed: OMSPatcher could not read installed OMS owner from OUI inventory by itself.
Please add OMSPatcher.OMS_USER=<OMS installed user> to command line and try again.

Log file location: d:\oracle\product\mw13cR2\cfgtoollogs\omspatcher\25501489\omspatcher_2017-04-06_11-22-22AM_deploy.log

Recommended actions: Please check if OUI inventory is locked by some other processes. Please check if OUI inventory is readable.

OMSPatcher failed with error code = 236

…and failed. But the required action is stated quite clearly, so I added the owner of the OMS home:

D:\CloudControl_cc13r2\Update\25501489>omspatcher apply OMSPatcher.OMS_USER=adkaiser
OMSPatcher Automation Tool
Copyright (c) 2017, Oracle Corporation.  All rights reserved.


OMSPatcher version : 13.8.0.0.2
OUI version        : 13.9.1.0.0
Running from       : d:\oracle\product\mw13cR2
Log file location  : d:\oracle\product\mw13cR2\cfgtoollogs\omspatcher\opatch2017-04-06_11-25-23AM_1.log

OMSPatcher log file: d:\oracle\product\mw13cR2\cfgtoollogs\omspatcher\25501489\omspatcher_2017-04-06_11-25-29AM_deploy.log

Please enter OMS weblogic admin server URL(t3s://omshost.acme.com:7101):>
Please enter OMS weblogic admin server username(weblogic):>
Please enter OMS weblogic admin server password:>


OMSPatcher could not read installed OMS owner from OUI inventory by itself.
Please add OMSPatcher.OMS_USER=<OMS installed user> to command line and try again.


[ Error during Get Central Inventory Information Phase]. Detail: OMSPatcher was not able to read OUI inventory to retrieve installed user & system details.
OMSPatcher failed: OMSPatcher could not read installed OMS owner from OUI inventory by itself.
Please add OMSPatcher.OMS_USER=<OMS installed user> to command line and try again.

Log file location: d:\oracle\product\mw13cR2\cfgtoollogs\omspatcher\25501489\omspatcher_2017-04-06_11-25-29AM_deploy.log

Recommended actions: Please check if OUI inventory is locked by some other processes. Please check if OUI inventory is readable.

OMSPatcher failed with error code = 236

The logfile is not very helpful either:

D:\CloudControl_cc13r2\Update\25501489>type d:\oracle\product\mw13cR2\cfgtoollogs\omspatcher\25501489\omspatcher_2017-04-06_11-25-29AM_deploy.log
[06.04.2017 11:25:29]        OMSPatcher has successfully verified min_patching_tool_version check
[06.04.2017 11:25:47]        [ Error during Get Central Inventory Information phase ] Detail:                              OMSPatcher was not able to read OUI inventory to retrieve installed user & system details.
OMSPatcher could not read installed OMS owner from OUI inventory by itself.
                             Please add OMSPatcher.OMS_USER=<OMS installed user> to command line and try again.

So I started researching and found EM13c: OMSPatcher Analyze Command For 13c OMS Fails With Error “OMSPatcher failed with error code 236” (Doc ID 2136840.1) which says that the parameter needs to be specified inside a properties file. There is another doc EM 13c: How to Apply a Patch to the Enterprise Manager 13c Cloud Control OMS Oracle Home (Doc ID 2091619.1) which describes how to create this properties file.

First step is to create the WebLogic encrypted configuration and key files.

D:\CloudControl_cc13r2\Update>%ORACLE_HOME%\OMSpatcher\wlskeys\createkeys.cmd -oh %ORACLE_HOME% -location D:\CloudControl_cc13r2\Update

[...]

Your environment has been set.
Please enter weblogic admin server username::> weblogic
Please enter weblogic admin server password::>
CreateKeys Weblogic API executed successfully.

User configuration file created: D:\CloudControl_cc13r2\Update\config
User key file created: D:\CloudControl_cc13r2\Update\key
'createkeys.bat' succeeded.

These two lines can I now use in my properties file along with the OMS owner. This is the content of my properties file:

AdminServerURL=t3s://omshost.acme.com:7101
AdminConfigFile=D:\\CloudControl_cc13r2\\Update\\config
AdminKeyFile=D:\\CloudControl_cc13r2\\Update\\key
OPatchAuto.OMS_USER=administrator

Be aware of the double backslashes, otherwise it won’t work.

Now finally I am able to patch my OMS using this properties file.

D:\CloudControl_cc13r2\Update\25501489>omspatcher apply -property_file D:\CloudControl_cc13r2\Update\properties.txt
OMSPatcher Automation Tool
Copyright (c) 2017, Oracle Corporation.  All rights reserved.


OMSPatcher version : 13.8.0.0.2
OUI version        : 13.9.1.0.0
Running from       : d:\oracle\product\mw13cR2
Log file location  : d:\oracle\product\mw13cR2\cfgtoollogs\omspatcher\opatch2017-04-06_13-08-43PM_1.log

OMSPatcher log file: d:\oracle\product\mw13cR2\cfgtoollogs\omspatcher\25501489\omspatcher_2017-04-06_13-08-53PM_analyze.log



WARNING: Could not apply the patch "25414328" because the "oracle.sysman.vi.oms.plugin with version 13.2.1.0.0" core component of the OMS or the plug-in for which the patch is intended is either not deployed or deployed with another version in your Enterprise Manager system.
WARNING: Could not apply the patch "25414306" because the "oracle.sysman.emfa.oms.plugin with version 13.2.1.0.0" core component of the OMS or the plug-in for which the patch is intended is either not deployed or deployed with another version in your Enterprise Manager system.
WARNING: Could not apply the patch "25118889" because the "oracle.sysman.vt.oms.plugin with version 13.2.1.0.0" core component of the OMS or the plug-in for which the patch is intended is either not deployed or deployed with another version in your Enterprise Manager system.
WARNING: Could not apply the patch "25414263" because the "oracle.sysman.csm.oms.plugin with version 13.2.1.0.0" core component of the OMS or the plug-in for which the patch is intended is either not deployed or deployed with another version in your Enterprise Manager system.
WARNING: Could not apply the patch "25414255" because the "oracle.sysman.ssa.oms.plugin with version 13.2.1.0.0" core component of the OMS or the plug-in for which the patch is intended is either not deployed or deployed with another version in your Enterprise Manager system.
WARNING: Could not apply the patch "25414356" because the "oracle.sysman.smf.oms.plugin with version 13.2.1.0.0" core component of the OMS or the plug-in for which the patch is intended is either not deployed or deployed with another version in your Enterprise Manager system.

Configuration Validation: Success


Running apply prerequisite checks for sub-patch(es) "25414294,25414339,25414245,25414317,25414281" and Oracle Home "d:\oracle\product\mw13cR2"...
Sub-patch(es) "25414294,25414339,25414245,25414317,25414281" are successfully analyzed for Oracle Home "d:\oracle\product\mw13cR2"

[...]

Advertisements

ORA-28545 when querying HS gateway in Oracle 12c

Since our customers are forced to move to Oracle Database 12.1, I am in the role to support their upgrades. Recently I upgraded a Windows environment that used the Heterogeneous Services to connect to an external ODBC datasource via a database link. The configuration is a little tricky, but pretty straight forward once you get the idea.

First, there must be an ODBC datasource. That is a simple System DSN which was already in place.

Second, you have to choose a kind of “virtual” SID for your connection to this ODBC datasource.

Third, create an init.ora in the $ORACLE_HOME/hs/admin directory and put at least one parameter into it:

HS_FDS_CONNECT_INFO="<Name of ODBC System DSN>"

Fourth, add static SID to the listener  by modifying the listener.ora file:

SID_LIST_LISTENER_STD =
(SID_LIST =
(SID_DESC=
(SID_NAME=<virtual SID>)
(ORACLE_HOME=c:\oracle\product\12.1.0.2\dbprod)
(PROGRAM=dg4odbc)
)
)

Then reload the listener configuration to make it active.

Fifth, create a tnsnames.ora entry that points to the listener and the virtual SID. Make sure, you specify “HS=OK”.

MYODBCDS =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = tcp)(HOST = <listener host>)(PORT = 1521))
(CONNECT_DATA =
(SID = <virtual SID>)
)
(HS = OK)
)

Sixth and finally, create a database link that points to the tnsnames entry:

SQL> create database link myodbcds connect to <user> identified by <pwd> using "MYODBCDS";

Database link created.

All those steps were done years ago in the old 11.2 environment. Everything worked fine there. So I simply copied all the configuration details to the new 12.1 Oracle home. Then I moved the listener to the new 12.1 home and upgraded the database. But when testing the HS connection, the following happened:

SQL*Plus: Release 12.1.0.2.0 Production on Tue Mar 28 11:30:45 2017

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Last Successful login time: Tue Mar 28 2017 11:06:02 +02:00

Connected to:
Oracle Database 12c Standard Edition Release 12.1.0.2.0 - 64bit Production

SQL> select * from dual@myodbcds ;
select * from dual@myodbcds
*
ERROR at line 1:
ORA-28545: error diagnosed by Net8 when connecting to an agent
Unable to retrieve text of NETWORK/NCR message 65535
ORA-02063: preceding 2 lines from MYODBCDS

I checked the listener services, the connection seemed to be established successfully.

C:\>lsnrctl services listener

LSNRCTL for 64-bit Windows: Version 12.1.0.2.0 - Production on 28-MAR-2017 11:39:05

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=<hostname>)(PORT=1521)))
Services Summary...
Service "<virtual SID>" has 1 instance(s).
Instance "<virtual SID>", status UNKNOWN, has 1 handler(s) for this service...
Handler(s):
"DEDICATED" established:1 refused:0
LOCAL SERVER
The command completed successfully

So I created a new listener in the old 11.2 home that used another port to test things. The 11.2 listener worked fine. So what is the difference between both listeners?
After a little research I stumbled upon the Windows services. The database service was not running as LocalSystem but as a domain user. This is because the database writes it’s backups to an UNC share. Both, 11.2 and 12.1, listeners used the LocalSystem account. So I changed the 12.1 listener to use the domain account too and this solved the problem.

SQL*Plus: Release 12.1.0.2.0 Production on Tue Mar 28 11:39:10 2017

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Last Successful login time: Tue Mar 28 2017 11:36:53 +02:00

Connected to:
Oracle Database 12c Standard Edition Release 12.1.0.2.0 - 64bit Production

SQL> select * from dual@myodbcds;

D
-
X

I have no idea why the 11.2 listener can run as LocalSystem whilst the 12.1 listener cannot. But generally I think it is a good practice to run listener and database services with the same user credentials.

Managed Redo Apply using all CPU

Again and again I’m forced to setup Data Guard on Windows platforms. Windows is not my favorite platform but that does not matter sometimes. On the other hand, the Oracle command line tools are identical on every platform.
So this is the inital setup that we have:

  • 2 Servers, 2x 10 Core CPU and sufficient RAM with some TB local disks
  • servers in separate compute centers
  • Windows 2012 R2
  • Oracle Database Enterprise Edition 12.1.0.2.160119 non-CDB

Pretty obvious, that Data Guard would be a good option to decrease the unplanned outages for databases on those servers. Everything from installation to creating databases and setting up Data Guard went fine. Until we first tested a switchover. Sounds simple and in fact it is. At least in terms of syntax. But after the switchover the new standby database server was heavily loaded and did hardly respond to anything we were trying. After a while, 30 minutes or so, it was all over and the standby server was working fine again with no load at all. At this point in time there was no data nor any application load at all in the database. Just an freshly created empty database.
We then did another switchover to make the standby database primary again. And again, the new standby server was heavily loaded for quite some time. When investigating this behaviour we found the following in the alert.log:

2016-04-18 13:29:47.131000 +02:00
Started logmerger process
Managed Standby Recovery starting Real Time Apply
Parallel Media Recovery started with 40 slaves

The automatic parallel recovery which is enabled by default decided to use 40 processes. That makes sense when we look at the hardware specification. But my assumption was, that there must be something wrong with that. I had no proof that the recovery slaves where causing the high CPU load since the system was so unresponsive. We simply modified the Data Guard configuration to reduce the parallel degree.

DGMGRL> edit database testdb_a set property ApplyParallel = 8;
DGMGRL> edit database testdb_b set property ApplyParallel = 8;

Again we tried another switchover and monitored the new standby server. This time it was not loaded at all. I still don’t know why the 40 processes where causing such a heavy load, but for the time being I am happy with what we have now.

So be careful when setting up Data Guard. Do tests and monitor the systems. And if you ever experience heavy load when there should be no load at all, remember my posing.

Repeating CRS-8503 in Grid Infrastructure Log

During my odyssey of applying the current Bundlepatch to a 12.1.0.2 Grid Infrastructure installation on Windows I stumbled upon a repeating error in the cluster alert.log.

PS C:\> get-content %ORACLE_BASE%/diag/crs/<hostname>/crs/trace/alert.log -tail 4 -wait
Tue Dec 15 02:28:58 2015
Errors in file D:\app\oracle\diag\crs\<hostname>\crs\trace\ocrcheck_8216.trc  (incident=81):
CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []
Incident details in: D:\app\oracle\diag\crs\<hostname>\crs\incident\incdir_81\ocrcheck_8216_i81.trc

This error occurs over and over again with an interval of several hours. The My Oracle Support research revealed Doc ID 1997476.1 which can be worked around easily by setting the “asm_diskstring” parameter explicitly.

D:\>set ORACLE_SID=+ASM1

D:\>sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Tue Dec 15 09:07:53 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> show parameter asm

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups                       string      OCR, GIMGMT, DATA, FRA, SSD
asm_diskstring                       string
asm_power_limit                      integer     1
asm_preferred_read_failure_groups    string
SQL> alter system set asm_diskstring='\\.\ORCLDISK*' scope=both sid='*';

System altered.

SQL> show parameter asm

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups                       string      OCR, GIMGMT, DATA, FRA, SSD
asm_diskstring                       string      \\.\ORCLDISK*
asm_power_limit                      integer     1
asm_preferred_read_failure_groups    string

After that the errors did not occur anymore in the alert.log. And that’s it finally. The cluster may now go live… May the force be with me 🙂

Network not started after applying Bundlepatch to Grid Infrastructure

In the previous post I described how to cope with issues before start applying the bundlepatch. Now this post is about an issue I run into after applying this bundlepatch.
During execution of “rootcrs.pl -postpatch” there were errors starting several resources. All of these issues were caused by the network resource which could not be started after the patch. The output, or clusterware alert.log, showed something like this:

2015-12-15 11:24:10.375 [ORAROOTAGENT(7672)]CRS-5017: The resource action "ora.net1.network start" encountered the following error:
2015-12-15 11:24:10.375+CRS-5008: Invalid attribute value: <Network name> for the network interface. For details refer to "(:CLSN00107:)" in "D:\app\oracle\diag\crs\<hostname>\crs\trace\crsd_orarootagent_system.trc".

Nobody changed the network interface names, no one modified the clusters network configuration. So what?
After a couple of hours of rebooting, investigating and tearing hair I stumbled upon a MOS note Windows: gpnpd fails to start as “oifcfg iflist -p -n” does not show any network interface (Doc ID 1967213.1) which included the final hint. In my case the GPNPD was up by the way. But the GI was no longer able to read the network configuration:

D:\>oifcfg iflist

D:\>

Not a single interface available. As stated in the MOS note, the “wimgmt” service was not running. I disabled that service because it was using some of GI’s files and thus I could not run “opatch apply -local”. My mistake…. Re-enabling the service fixed this issue. Everything runs fine again.

“rootcrs.pl -prepatch” fails when applying Windows Bundlepatch to GI

When applying Bundlepatches to a Grid Infrastructure installation on Windows, one needs to execute the “rootcrs.pl -prepatch” srcipt in order to stop and unlock the clusterware stack. Sometimes this script ends up with the following error:

C:\Windows\system32>%ORACLE_HOME%\perl\bin\perl -I%ORACLE_HOME%\perl\lib -I%ORACLE_HOME%\crs\install %ORACLE_HOME%\crs\install\rootcrs.pl -prepatch
Using configuration parameter file: D:\app\grid\12.1.0.2\crs\install\crsconfig_params
2015/12/14 13:39:12 CLSRSC-546: The Grid Infrastructure processes identified by the following PIDs (9596) are still active.

Died at d:\app\grid\12.1.0.2\crs\install/crspatch.pm line 732.

This in turns prevents the patch procedure to complete successfully because the “rootcrs.pl -postpatch” execution will find out that the unlock did not complete without error.
Obviously one should only proceed when the prepatch completed without any issues. Looking for the PID mentioned in the error message using either Windows Taskmanager or ProcessExplorer did not return any result. There is no process with that PID. It is kind of a orphaned process. Let’s find out where the PID comes from by looking into the logfile under

%ORACLE_HOME%\cfgtoollogs\crsconfig\crspatch_<servername>_<timestamp>.log

Search the file for the PID showes something like this:

2015-12-14 13:33:21: PID in <D:\app\oracle\crsdata\<servername>\output\crsd_scriptagent_system.pid> is: 9596
2015-12-14 13:33:21: The GI process identified by PID<9596> is running

Now one can replace the PID in that file, 9596 in this case, with a simple 0. Do leave the *.pid file empty since this would lead to another error:

C:\Windows\system32>%ORACLE_HOME%\perl\bin\perl -I%ORACLE_HOME%\perl\lib -I%ORACLE_HOME%\crs\install %ORACLE_HOME%\crs\install\rootcrs.pl -prepatch
Using configuration parameter file: D:\app\grid\12.1.0.2\crs\install\crsconfig_params
Can't kill a non-numeric process ID at d:\app\grid\12.1.0.2\crs\install/crsutils.pm line 12324.

Thrust me, use a 0 and the “rootcrs.pl -prepatch” will complete successfully.
There are a some more things I run into which I will explain in upcoming blog posts.
So long, thanks for reading.

Environment Variables, SQL*Net and Windows

From time to time I stumble upon questions regarding environment settings for Oracle installations on Windows, either Server or Client. The question are always similar. What variables are set? Where are they set? What about registry entries? How do I get the values in use? This post will hopefully clarify some things about that.

Where do I set environment variables?

There are several places to define environment variables. This is the list of possibilities in order of precedence.

  1. set directly in the current shell (CMD, set command)
  2. user variable set in System properties
  3. system variable set in System properties
  4. Registry value in HKLM\software\oracle\<Home name>

Which SQL*Net configuration files are used?

The search order for SQL*Net files like sqlnet.ora and tnsnames.ora is like this:

  1. Current directory
  2. Path defined by TNS_ADMIN variable (see above for specifying variables)
  3. %ORACLE_HOME%\network\admin

Which values are actually in use?

Since the “set” command displays only variables and values that are defined as environment variables and ignores any values defined inside the Registry, we need another way to get these values. We can use SQL*Plus for this:

C:\Users\marco.mischke>set NLS_LANG=american_america.we8pc850

C:\Users\marco.mischke>sqlplus /nolog

SQL*Plus: Release 11.2.0.4.0 Production on Tue Dec 1 10:40:56 2015

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

SQL> @[%TNS_ADMIN%]
SP2-0310: unable to open file "[from registry].sql"
SQL> @[%NLS_LANG%]
SP2-0310: unable to open file "[american_america.we8pc850]"
SQL> @[%NLS_DATE_FORMAT%]
SP2-0310: unable to open file "[%NLS_DATE_FORMAT%]"

You see, SQL*Plus expands the variables to the final value, even if the value comes from the Registry. Wben there is no value anywhere defined, it simply displays the variable name.