To lose or not to lose the GPNP Profile

Currently I’m preparing a new presentation about Oracle Grid Infrastructure Backup & Recovery. It will contain information about what should be backuped up beyond databases and especially how to recover from several error scenarios. One of these scenarios I was thinking of was losing the GPNP profile. It stores information about where to find ASM parameterfile, what disks to discover, which networks to use and so on. So it is quite important and it is required to start the cluster stack at the very beginning. The following scenario was tested with version 12.1.0.2 of Grid Infrastructure.

First the basics, the GPNP profile is stored in a XML file located in $GRID_HOME/gpnp/<nodename>/profiles/peer/profile.xml.

[oracle@oel6u4 ~]$ ls -l /u01/app/grid/12.1.0.2/gpnp/oel6u4/profiles/peer/profile.xml
-rw-r--r-- 1 oracle oinstall 1986 Mar 31 10:06 /u01/app/grid/12.1.0.2/gpnp/oel6u4/profiles/peer/profile.xml

[oracle@oel6u4 ~]$ cat /u01/app/grid/12.1.0.2/gpnp/oel6u4/profiles/peer/profile.xml
<?xml version="1.0" encoding="UTF-8"?><gpnp:GPnP-Profile Version="1.0" xmlns="http://www.grid-pnp.org/2005/11/gpnp-profile" xmlns:gpnp="http://www.grid-pnp.org/2005/11/gpnp-profile" xmlns:orcl="http://www.oracle.com/gpnp/2005/11/gpnp-profile" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.grid-pnp.org/2005/11/gpnp-profile gpnp-profile.xsd" ProfileSequence="22" ClusterUId="a3650eb3772d4ff9bf115d2157a0effc" ClusterName="mycluster" PALocation=""><gpnp:Network-Profile><gpnp:HostNetwork id="gen" HostName="*"><gpnp:Network id="net1" IP="192.168.1.0" Adapter="eth2" Use="asm,cluster_interconnect"/><gpnp:Network id="net2" IP="192.168.56.0" Adapter="eth3" Use="public"/><gpnp:Network id="net3" Adapter="eth4" Use="public" IP="192.168.1.0"/></gpnp:HostNetwork></gpnp:Network-Profile><orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/><orcl:ASM-Profile id="asm" DiscoveryString="/dev/oracleasm/disks/*" SPFile="+OCR/mycluster/ASMPARAMETERFILE/registry.253.907927597" Mode="remote"/><ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#"><ds:SignedInfo><ds:CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/><ds:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"/><ds:Reference URI=""><ds:Transforms><ds:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/><ds:Transform Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"> <InclusiveNamespaces xmlns="http://www.w3.org/2001/10/xml-exc-c14n#" PrefixList="gpnp orcl xsi"/></ds:Transform></ds:Transforms><ds:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/><ds:DigestValue>0S5hJDSQrW+BP+IMSS1ZUYXUlGg=</ds:DigestValue></ds:Reference></ds:SignedInfo><ds:SignatureValue>IvfOT07OtXipGDCOIfZBXq47MDnO421XgViOe4UkKx/7i+XLHxh+aV1lgMZx8yF8ukiZGLWBCYDrycwTy6XKn/Xi7XFWhCq21K6IzpxgaVaZkXN+qjU/WsGLbydtfz3RdNy8NspOR1vs/WLx2bGd0ABitiNvRddukVSgrWjxBV4=</ds:SignatureValue></ds:Signature></gpnp:GPnP-Profile>

I simply moved everyting from this directory elsewhere:

 
[oracle@oel6u4 ~]$ mv /u01/app/grid/12.1.0.2/gpnp/oel6u4/profiles/peer/* /tmp/gpnpprofile/

And rebooted the node. What then happened, surprised me. Everything came up fine again.

[oracle@oel6u4 trace]$ crsctl stat res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       oel6u4                   Started,STABLE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       oel6u4                   STABLE
ora.crf
      1        ONLINE  ONLINE       oel6u4                   STABLE
ora.crsd
      1        ONLINE  ONLINE       oel6u4                   STABLE
ora.cssd
      1        ONLINE  ONLINE       oel6u4                   STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       oel6u4                   STABLE
ora.ctssd
      1        ONLINE  ONLINE       oel6u4                   OBSERVER,STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.drivers.acfs
      1        ONLINE  ONLINE       oel6u4                   STABLE
ora.evmd
      1        ONLINE  ONLINE       oel6u4                   STABLE
ora.gipcd
      1        ONLINE  ONLINE       oel6u4                   STABLE
ora.gpnpd
      1        ONLINE  ONLINE       oel6u4                   STABLE
ora.mdnsd
      1        ONLINE  ONLINE       oel6u4                   STABLE
ora.storage
      1        ONLINE  ONLINE       oel6u4                   STABLE
--------------------------------------------------------------------------------

So I had a look into the log files to find out what happened. The “ohasd.trc” has nothing very useful, but “gpnpd.trc” has.

2016-03-31 11:58:14.058953 : default:2930685536: gpnpd START pid=2677 Oracle Grid Plug-and-Play Daemon
2016-03-31 11:58:14.059582 : default:2930685536: clsgpnpd_main instance started
2016-03-31 11:58:14.060566 :    GPNP:2930685536: clsgpnp_Init: [at clsgpnp0.c:654] '/u01/app/grid/12.1.0.2' in effect as GPnP home base.
2016-03-31 11:58:14.060580 :    GPNP:2930685536: clsgpnp_Init: [at clsgpnp0.c:708] GPnP pid=2677, cli=gpnpd GPNP comp tracelevel=1, depcomp tracelevel=0, tlsrc:ORA_DAEMON_LOGGING_LEVELS, apitl:0, complog:1, tstenv:0, devenv:0, envopt:0, flags=3
2016-03-31 11:58:14.090050 :    GPNP:2930685536: clsgpnpkwf_initwfloc: [at clsgpnpkwf.c:402] Using FS Wallet Location : /u01/app/grid/12.1.0.2/gpnp/oel6u4/wallets/peer/

2016-03-31 11:58:14.160264 :    GPNP:2930685536: clsgpnpkwf_initwfloc: [at clsgpnpkwf.c:414] Wallet readable. Path: /u01/app/grid/12.1.0.2/gpnp/oel6u4/wallets/peer/

2016-03-31 11:58:14.180092 :    GPNP:2930685536: clsgpnp_InitLocalPrfCacheProvs: [at clsgpnp0.c:4951] Result: (1) CLSGPNP_ERR. (:GPNP00258:)Error initializing gpnp local profile cache provider 1 of 2 (LCP-FS).
2016-03-31 11:58:14.286970 :    GPNP:2930685536: clsgpnpd_lOpenEP: [at clsgpnpd.c:2004] Listening on "ipc://GPNPD_oel6u4"
2016-03-31 11:58:14.293982 :  CLSDMT:2925598464: PID for the Process [2677], connkey 10
2016-03-31 11:58:15.002288 :    GPNP:2930685536: clsgpnpd_validateProfile: [at clsgpnpdcmn.c:1013] GPnPD taken cluster guid 'a3650eb3772d4ff9bf115d2157a0effc'
2016-03-31 11:58:15.002333 :    GPNP:2930685536: clsgpnpd_validateProfile: [at clsgpnpdcmn.c:1040] GPnPD taken cluster name 'mycluster'
2016-03-31 11:58:15.002342 :    GPNP:2930685536: clsgpnpd_openLocalProfile: [at clsgpnpd.c:2380] Got local profile from OLR cache provider (LCP-OLR).
2016-03-31 11:58:15.002354 :    GPNP:2930685536: clsgpnpd_openLocalProfile: [at clsgpnpd.c:2428] Result: (3) CLSGPNP_INIT_FAILED. (:GPNPD00109:)best profile was not saved in file local cache provider (LCP-FS) p=0x1a41bf0
2016-03-31 11:58:15.004650 :    GPNP:2930685536: clsgpnpd_lCheckIpTypes: [at clsgpnpd.c:1714] Profile Networks Definitions - 3 total
2016-03-31 11:58:15.004791 :    GPNP:2930685536: clsgpnpd_lFilterIpTypes: [at clsgpnpd.c:1615]   - eth3/192.168.56.0 public (ip=,mask=,mac=,typ=1)
2016-03-31 11:58:15.004802 :    GPNP:2930685536: clsgpnpd_lFilterIpTypes: [at clsgpnpd.c:1615]   - eth4/192.168.1.0 public (ip=,mask=,mac=,typ=1)
2016-03-31 11:58:15.004864 :    GPNP:2930685536: clsgpnpd_lFilterIpTypes: [at clsgpnpd.c:1615]   - eth2/192.168.1.0 cluster_interconnect,asm (ip=,mask=,mac=,typ=1)
2016-03-31 11:58:15.004874 :    GPNP:2930685536: clsgpnpd_lFilterIpTypes: [at clsgpnpd.c:1636]   of 3 net interfaces, 2 publics (2 ipv4, 0 ipv6), 1 privates (1 ipv4, 0 ipv6).
2016-03-31 11:58:15.013276 :    GPNP:2930685536: clsgpnpd_lCheckIpTypes: [at clsgpnpd.c:1751] GPnP Node Network Interfaces - 3 total
2016-03-31 11:58:15.013582 :    GPNP:2930685536: clsgpnpd_lFilterIpTypes: [at clsgpnpd.c:1615]   - eth3/192.168.56.0 public (ip=192.168.56.101,mask=255.255.255.0,mac=08-00-27-2e-bc-d6,typ=1)
2016-03-31 11:58:15.013593 :    GPNP:2930685536: clsgpnpd_lFilterIpTypes: [at clsgpnpd.c:1615]   - eth4/192.168.1.0 public (ip=192.168.1.1,mask=255.255.255.0,mac=08-00-27-d1-db-78,typ=1)
2016-03-31 11:58:15.013736 :    GPNP:2930685536: clsgpnpd_lFilterIpTypes: [at clsgpnpd.c:1615]   - eth2/192.168.1.0 cluster_interconnect,asm (ip=192.168.1.1,mask=255.255.255.0,mac=08-00-27-3d-33-dd,typ=1)
2016-03-31 11:58:15.013745 :    GPNP:2930685536: clsgpnpd_lFilterIpTypes: [at clsgpnpd.c:1636]   of 3 net interfaces, 2 publics (2 ipv4, 0 ipv6), 1 privates (1 ipv4, 0 ipv6).
2016-03-31 11:58:15.014032 :    GPNP:2930685536: clsgpnpd_lOpenEP: [at clsgpnpd.c:1996] Listening on "tcp://0.0.0.0:61417", call address "tcp://oel6u4:61417" ipv4
2016-03-31 11:58:15.046511 : default:2930685536: GPNPD started on node oel6u4.
2016-03-31 11:58:15.046697 :    GPNP:2930685536: clsgpnpd_main: [at clsgpnpd.c:468] --- Local best profile:
2016-03-31 11:58:15.046706 :    GPNP:2930685536: clsgpnpd_main: <?xml version="1.0" encoding="UTF-8"?><gpnp:GPnP-Profile Versio[cont]
2016-03-31 11:58:15.046713 :    GPNP:2930685536: clsgpnpd_main: n="1.0" xmlns="http://www.grid-pnp.org/2005/11/gpnp-profile" xm[cont]
2016-03-31 11:58:15.046719 :    GPNP:2930685536: clsgpnpd_main: lns:gpnp="http://www.grid-pnp.org/2005/11/gpnp-profile" xmlns:o[cont]
2016-03-31 11:58:15.046725 :    GPNP:2930685536: clsgpnpd_main: rcl="http://www.oracle.com/gpnp/2005/11/gpnp-profile" xmlns:xsi[cont]
2016-03-31 11:58:15.046731 :    GPNP:2930685536: clsgpnpd_main: ="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation[cont]
2016-03-31 11:58:15.046737 :    GPNP:2930685536: clsgpnpd_main: ="http://www.grid-pnp.org/2005/11/gpnp-profile gpnp-profile.xsd[cont]
2016-03-31 11:58:15.046742 :    GPNP:2930685536: clsgpnpd_main: " ProfileSequence="22" ClusterUId="a3650eb3772d4ff9bf115d2157a0[cont]
2016-03-31 11:58:15.046748 :    GPNP:2930685536: clsgpnpd_main: effc" ClusterName="mycluster" PALocation=""><gpnp:Network-Profi[cont]
2016-03-31 11:58:15.046754 :    GPNP:2930685536: clsgpnpd_main: le><gpnp:HostNetwork id="gen" HostName="*"><gpnp:Network id="ne[cont]
2016-03-31 11:58:15.046760 :    GPNP:2930685536: clsgpnpd_main: t1" IP="192.168.1.0" Adapter="eth2" Use="asm,cluster_interconne[cont]
2016-03-31 11:58:15.046767 :    GPNP:2930685536: clsgpnpd_main: ct"/><gpnp:Network id="net2" IP="192.168.56.0" Adapter="eth3" U[cont]
2016-03-31 11:58:15.046772 :    GPNP:2930685536: clsgpnpd_main: se="public"/><gpnp:Network id="net3" Adapter="eth4" Use="public[cont]
2016-03-31 11:58:15.046779 :    GPNP:2930685536: clsgpnpd_main: " IP="192.168.1.0"/></gpnp:HostNetwork></gpnp:Network-Profile><[cont]
2016-03-31 11:58:15.046784 :    GPNP:2930685536: clsgpnpd_main: orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration=[cont]
2016-03-31 11:58:15.046790 :    GPNP:2930685536: clsgpnpd_main: "400"/><orcl:ASM-Profile id="asm" DiscoveryString="/dev/oraclea[cont]
2016-03-31 11:58:15.046796 :    GPNP:2930685536: clsgpnpd_main: sm/disks/*" SPFile="+OCR/mycluster/ASMPARAMETERFILE/registry.25[cont]
2016-03-31 11:58:15.046801 :    GPNP:2930685536: clsgpnpd_main: 3.907927597" Mode="remote"/><ds:Signature xmlns:ds="http://www.[cont]
2016-03-31 11:58:15.046973 :    GPNP:2930685536: clsgpnpd_main: w3.org/2000/09/xmldsig#"><ds:SignedInfo><ds:CanonicalizationMet[cont]
2016-03-31 11:58:15.046978 :    GPNP:2930685536: clsgpnpd_main: hod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/><ds:Si[cont]
2016-03-31 11:58:15.046982 :    GPNP:2930685536: clsgpnpd_main: gnatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-[cont]
2016-03-31 11:58:15.046986 :    GPNP:2930685536: clsgpnpd_main: sha1"/><ds:Reference URI=""><ds:Transforms><ds:Transform Algori[cont]
2016-03-31 11:58:15.046995 :    GPNP:2930685536: clsgpnpd_main: thm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/><d[cont]
2016-03-31 11:58:15.047000 :    GPNP:2930685536: clsgpnpd_main: s:Transform Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"[cont]
2016-03-31 11:58:15.047004 :    GPNP:2930685536: clsgpnpd_main: > <InclusiveNamespaces xmlns="http://www.w3.org/2001/10/xml-exc[cont]
2016-03-31 11:58:15.047008 :    GPNP:2930685536: clsgpnpd_main: -c14n#" PrefixList="gpnp orcl xsi"/></ds:Transform></ds:Transfo[cont]
2016-03-31 11:58:15.047012 :    GPNP:2930685536: clsgpnpd_main: rms><ds:DigestMethod Algorithm="http://www.w3.org/2000/09/xmlds[cont]
2016-03-31 11:58:15.047016 :    GPNP:2930685536: clsgpnpd_main: ig#sha1"/><ds:DigestValue>0S5hJDSQrW+BP+IMSS1ZUYXUlGg=</ds:Dige[cont]
2016-03-31 11:58:15.047020 :    GPNP:2930685536: clsgpnpd_main: stValue></ds:Reference></ds:SignedInfo><ds:SignatureValue>IvfOT[cont]
2016-03-31 11:58:15.047024 :    GPNP:2930685536: clsgpnpd_main: 07OtXipGDCOIfZBXq47MDnO421XgViOe4UkKx/7i+XLHxh+aV1lgMZx8yF8ukiZ[cont]
2016-03-31 11:58:15.047029 :    GPNP:2930685536: clsgpnpd_main: GLWBCYDrycwTy6XKn/Xi7XFWhCq21K6IzpxgaVaZkXN+qjU/WsGLbydtfz3RdNy[cont]
2016-03-31 11:58:15.047033 :    GPNP:2930685536: clsgpnpd_main: 8NspOR1vs/WLx2bGd0ABitiNvRddukVSgrWjxBV4=</ds:SignatureValue></[cont]
2016-03-31 11:58:15.047037 :    GPNP:2930685536: clsgpnpd_main: ds:Signature></gpnp:GPnP-Profile>

So the GPNPD found a profile in the Local Registry (OLR). Nice. And it tells us that this profile was not written to disk. That is something we can do on our own.

[oracle@oel6u4 trace]$ gpnptool get -o=/u01/app/grid/12.1.0.2/gpnp/oel6u4/profiles/peer/profile.xml
Resulting profile written to "/u01/app/grid/12.1.0.2/gpnp/oel6u4/profiles/peer/profile.xml".
Success.
[oracle@oel6u4 trace]$ ll /u01/app/grid/12.1.0.2/gpnp/oel6u4/profiles/peer/
total 4
-rw-r--r-- 1 oracle oinstall 1986 Mar 31 13:08 profile.xml

Now as a last step I verified the content of the OLR just for my understanding. There are sections related to profiles inside OLR:

[root@oel6u4 ~]# ocrdump -local  /tmp/olrdump
[root@oel6u4 ~]# more  /tmp/olrdump

[SYSTEM.GPnP]
UNDEF :
SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_NONE, OTHER_PERMISSION : PROCR_NONE, USER_NAME : oracle, GROUP_NAME : oinstall}

[SYSTEM.GPnP.profiles]
BYTESTREAM (16) :
SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_NONE, OTHER_PERMISSION : PROCR_NONE, USER_NAME : oracle, GROUP_NAME : oinstall}

[SYSTEM.GPnP.profiles.peer]
UNDEF :
SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_QUERY_KEY, USER_NAME : oracle, GROUP_NAME : oinstall}

[SYSTEM.GPnP.profiles.peer.best]
BYTESTREAM (16) : 3c3f786d6c2076657273696f6e3d22312e302220656e636f64696e673d225554462d38223f3e3c67706e703a47506e502d50726f66696c652056657273696f6e3d22312e302220786d6c6e733d22687474703a2f2f7777772e677269642d706e702e6f72672f323030352f31312f67706e702d70726f66696c652220786d6c6e733a67706e703d22687474703a2f2f7777772e677269642d706e702e6f72672f323030352f31312f67706e702d70726f66696c652220786d6c6e733a6f72636c3d22687474703a2f2f7777772e6f7261636c652e636f6d2f67706e702f323030352f31312f67706e702d70726f66696c652220786d6c6e733a7873693d22687474703a2f2f7777772e77332e6f72672f323030312f584d4c536368656d612d696e7374616e636522207873693a736368656d614c6f636174696f6e3d22687474703a2f2f7777772e677269642d706e702e6f72672f323030352f31312f67706e702d70726f66696c652067706e702d70726f66696c652e787364222050726f66696c6553657175656e63653d2232322220436c75737465725549643d2261333635306562333737326434666639626631313564323135376130656666632220436c75737465724e616d653d226d79636c7573746572222050414c6f636174696f6e3d22223e3c67706e703a4e6574776f726b2d50726f66696c653e3c67706e703a486f73744e6574776f726b2069643d2267656e2220486f73744e616d653d222a223e3c67706e703a4e6574776f726b2069643d226e657431222049503d223139322e3136382e312e302220416461707465723d226574683222205573653d2261736d2c636c75737465725f696e746572636f6e6e656374222f3e3c67706e703a4e6574776f726b2069643d226e657432222049503d223139322e3136382e35362e302220416461707465723d226574683322205573653d227075626c6963222f3e3c67706e703a4e6574776f726b2069643d226e6574332220416461707465723d226574683422205573653d227075626c6963222049503d223139322e3136382e312e30222f3e3c2f67706e703a486f73744e6574776f726b3e3c2f67706e703a4e6574776f726b2d50726f66696c653e3c6f72636c3a4353532d50726f66696c652069643d226373732220446973636f76657279537472696e673d222b61736d22204c656173654475726174696f6e3d22343030222f3e3c6f72636c3a41534d2d50726f66696c652069643d2261736d2220446973636f76657279537472696e673d222f6465762f6f7261636c6561736d2f6469736b732f2a2220535046696c653d222b4f43522f6d79636c75737465722f41534d504152414d4554455246494c452f72656769737472792e3235332e39303739323735393722204d6f64653d2272656d6f7465222f3e3c64733a5369676e617475726520786d6c6e733a64733d22687474703a2f2f7777772e77332e6f72672f323030302f30392f786d6c6473696723223e3c64733a5369676e6564496e666f3e3c64733a43616e6f6e6963616c697a6174696f6e4d6574686f6420416c676f726974686d3d22687474703a2f2f7777772e77332e6f72672f323030312f31302f786d6c2d6578632d6331346e23222f3e3c64733a5369676e61747572654d6574686f6420416c676f726974686d3d22687474703a2f2f7777772e77332e6f72672f323030302f30392f786d6c64736967237273612d73686131222f3e3c64733a5265666572656e6365205552493d22223e3c64733a5472616e73666f726d733e3c64733a5472616e73666f726d20416c676f726974686d3d22687474703a2f2f7777772e77332e6f72672f323030302f30392f786d6c6473696723656e76656c6f7065642d7369676e6174757265222f3e3c64733a5472616e73666f726d20416c676f726974686d3d22687474703a2f2f7777772e77332e6f72672f323030312f31302f786d6c2d6578632d6331346e23223e203c496e636c75736976654e616d6573706163657320786d6c6e733d22687474703a2f2f7777772e77332e6f72672f323030312f31302f786d6c2d6578632d6331346e2322205072656669784c6973743d2267706e70206f72636c20787369222f3e3c2f64733a5472616e73666f726d3e3c2f64733a5472616e73666f726d733e3c64733a4469676573744d6574686f6420416c676f726974686d3d22687474703a2f2f7777772e77332e6f72672f323030302f30392f786d6c647369672373686131222f3e3c64733a44696765737456616c75653e305335684a44535172572b42502b494d5353315a555958556c47673d3c2f64733a44696765737456616c75653e3c2f64733a5265666572656e63653e3c2f64733a5369676e6564496e666f3e3c64733a5369676e617475726556616c75653e4976664f5430374f745869704744434f49665a42587134374d446e4f343231586756694f6534556b4b782f37692b584c4878682b6156316c674d5a7838794638756b695a474c574243594472796377547936584b6e2f58693758465768437132314b36497a7078676156615a6b584e2b716a552f5773474c62796474667a3352644e79384e73704f523176732f574c78326247643041426974694e76526464756b56536772576a784256343d3c2f64733a5369676e617475726556616c75653e3c2f64733a5369676e61747572653e3c2f67706e703a47506e502d50726f66696c653e00
SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_QUERY_KEY, USER_NAME : oracle, GROUP_NAME : oinstall}

As you can see, there really is a best profile stored inside OLR which enables my cluster node to start even when the “profile.xml” itself is missing. I thought this was different in 11.2 but I have no system available to check that. If you have any information about that, please co.

Advertisements

Database Upgrade with Standby on ODA

A couple of days ago I did my easiest upgrade of an Oracle Database ever. This is the system:

  • 2 Oracle Database Appliances in two locations
  • Databases running as single instance due to licensing restrictions
  • DataGuard between ODA-1 and ODA-2 for mission-critical databases
  • several databases, version 11.2.0.4 and 12.1.0.2
  • currently running ODA 12.1.2.4, so big excuse for not being up-to-date

The customer wanted me to upgrade one of these 11.2.0.4 databases to 12.1.0.2. So the basic steps are as follows:

  1. Install new Oracle Home for version 12.1.0.2
  2. Disable DataGuard configuration
  3. Stop DataGuard Broker processes
  4. Stop Standby database
  5. Upgrade Primary database
  6. Start Standby database from new home

On an ODA this is quite simple to do. First create a new Oracle Home an both ODAs:

root$ oakcli create dbhome -version 12.1.0.2.3
...
INFO: 2016-03-15 09:25:07: Installing a new Home : OraDb12102_home3 at /u01/app/oracle/product/12.1.0.2/dbhome_3
...
SUCCESS: 2016-03-15 09:30:55: Successfully created the Database Home : OraDb12102_home3

Now there is the new 12c home that we can use for the upgrade. One could also use an existing home that can be chosen from the list:

root$ oakcli show dbhomes -detail

Next step is disabling the DataGuard configuration:

oracle$ . oraenv
oracle$ dgmgrl
DGMGRL> connect sys/****@primary
Connected.
DGMGRL> disable configuration
Disabled.

Now stop the DataGuard Broker processes on both, primary and standby:

oracle$ . oraenv
oracle$ sqlplus / as sysdba
SQL> alter system set dg_broker_start=false scope=both;

System altered.

Stop the Standby database:

oracle$ srvctl stop database -d <dbname>

And now do the upgrade of the Primary to the Oracle Home that was created or chosen in the first step. It will ask for the SYS password, be prepared for that.

root$ oakcli upgrade database -db <dbname> -to OraDb12102_home3
INFO: 2016-03-15 10:30:30: Look at the log file '/opt/oracle/oak/log/<odabase-01>/tools/12.1.2.4.0/dbupgrade_2585.log' for more details 

Please enter the 'SYS'  password : 
Please-enter the 'SYS' password: 
2016-03-15 10:31:29: Upgrading the database <dbname>. It will take few minutes. Please wait... 

 SUCCESS: 2016-03-15 10:50:01: Successfully upgraded the database <dbname>

That’s it for the Primary. Now modify the Standby to use the new 12c Database Home. Copy the password file to the new Oracle Home, and, if it is not in a shared location, the SPfile too.

oracle$ cp <old ORACLE_HOME>/dbs/orapw<dbname> <new ORACLE_HOME>/dbs/ 
oracle$ cp <old ORACLE_HOME>/dbs/spfile<dbname>.ora <new ORACLE_HOME>/dbs/

The database resource needs to be modified in order to use the new Oracle Home. That is the last time that “srvctl” is run from the old Oracle Home:

oracle$ srvctl config database -d <dbname> 
oracle$ srvctl modify database -d <dbname> -o /u01/app/oracle/product/12.1.0.2/dbhome_3

If required, change the SPfile location too:

oracle$ srvctl modify database -d <dbname> -p /u01/app/oracle/product/12.1.0.2/dbhome_3/dbs/spfile<dbname>.ora

Now the Standby can be started again:

oracle$ srvctl start database -d <dbname>

Last step is to start the DataGuard Broker processes and re-enable the configuration.

oracle$ . oraenv
oracle$ sqlplus / as sysdba
SQL> alter system set dg_broker_start=true scope=both;

System altered.
oracle$ . oraenv
oracle$ dgmgrl
DGMGRL> connect sys/****@primary
Connected.
DGMGRL> enable configuration
Enabled.

Very simple and very straight forward. I like ODA ūüôā

There is some documentation that has been used:

Cluster Health Monitor

Preface

Issues with the Oracle Grid Infrastructure are sometimes related to OS issues like heavy load or similar. That’s why Oracle provided the OSWatcher tool in the past and later included that in the Grid Infrastructure¬†as Cluster Health Monitor (CHM).¬†With version 12.1.0.1 the Grid Inftrastrucure Repository (GIMR) was introduced which became mandatory with 12.1.0.2. The GIMR now holds all the OS related data which used to be stored in the filesystem before. The main tool to manage this data is called “oclumon”.

Setup

At first, one may want to check and change the retention of OS performance data. The retention is always specified in seconds and needs to be correlated to a size in MB. Oclumon can do that for us.
First, check the current retention:

[oracle@vm101 ~]$ oclumon manage -get repsize

CHM Repository Size = 68160 seconds

Ok, not even a full day of performance data. So let’s find out what is needed for a whole day:

[oracle@vm101 ~]$ oclumon manage -repos checkretentiontime 86400
The Cluster Health Monitor repository is too small for the desired retention. Please first resize the repository to 2598 MB

So there is the required size that can now be used for setting the new repository size.

[oracle@vm101 ~]$ oclumon manage -repos changerepossize 2598
The Cluster Health Monitor repository was successfully resized.The new retention is 86460 seconds.

That’s all.

Querying Data

Holding the data is one thing, but how to get that data from the repository? Basically there are two ways of retrieving data. Either use “oclumon dumpnodeview” or Enterprise Manager.

oclumon dumpnodeview

Oclumon is much more powerful, it can get all the data down to the process and device level when using the verbose mode. These are the available options:

[oracle@vm101 ~]$ oclumon dumpnodeview -h

dumpnodeview verb usage
=======================
The dumpnodeview command reports monitored records in the text format. The
collection of metrics for a node at a given point in time (a timestamp) is
called a node view.

* Usage
  dumpnodeview [-allnodes | -n <node1> ...] [-last <duration> |
                -s <timestamp> -e <timestamp>][-i <interval>][-v]
                [-system][-process][-device][filesystem][-nic]
                [-protoerr][-cpu][-topconsumer][-format <format type>]
                [-dir <directory> [-append]]

*Where
  -n <node1> ...   = Dump node views for specified nodes
  -allnodes        = Dump node views for all nodes
  -s <timestamp>   = Specify start time for range dump of node views
  -e <timestamp>   = Specify end time for range dump of node views
                     Absolute timestamp must be in "YYYY-MM-DD HH24:MI:SS"
                     format, for example "2007-11-12 23:05:00"
  -last <duration> = Dump the latest node views for a specified duration.
                     Duration must be in "HH24:MI:SS" format, for example
                     "00:45:00"
  -i               = Dump node views separated by the specified
                     interval in seconds. Must be a multiple of 5.
  -v               = Dump verbose node views containing all parts.
  -system, -cpu,.. = Dump each indicate node view parts.
  -format <format type> = format of the output.
                     <format type> can be legacy, tabular, or csv.
                     The default format is tabular.
  -dir <directory> = Dump node view part to file(s) in spceified dir.
                     With -append, will append the files. Overwrite otherwise.

The data is human readable but not very easy to analyze:

----------------------------------------
Node: vm101 Clock: '16-03-07 11.24.38 ' SerialNo:84380
----------------------------------------

SYSTEM:
#pcpus: 1 #vcpus: 4 cpuht: Y chipname: Dual-Core cpu: 7.25 cpuq: 0 physmemfree: 96760 physmemtotal: 5889124 mcache: 3344260 swapfree: 5953464 swaptotal: 6143996 hugepagetotal: 0 hugepagefree: 0 hugepagesize: 2048 ior: 68 iow: 56 ios: 11 swpin: 0 swpout: 0 pgin: 0 pgout: 4 netr: 41.895 netw: 74.346 procs: 328 procsoncpu: 1 rtprocs: 18 rtprocsoncpu: N/A #fds: 22240 #sysfdlimit: 6815744 #disks: 9 #nics: 3 nicErrors: 0

TOP CONSUMERS:
topcpu: 'gipcd.bin(23768) 5.40' topprivmem: 'java(27097) 488384' topshm: 'oracle_3224_-mg(3224) 562184' topfd: 'oraagent.bin(24093) 266' topthread: 'java(27097) 102'

CPUS:
cpu3: sys-3.71 user-4.58 nice-0.0 usage-8.29 iowait-0.0 steal-1.96
cpu2: sys-3.6 user-4.59 nice-0.0 usage-7.65 iowait-0.0 steal-1.96
cpu1: sys-3.76 user-3.32 nice-0.0 usage-7.9 iowait-0.0 steal-2.43
cpu0: sys-2.20 user-3.75 nice-0.0 usage-5.96 iowait-0.0 steal-2.20

PROCESSES:

name: 'evmlogger.bin' pid: 23762 #procfdlimit: 65536 cpuusage: 0.60 privmem: 5172 shm: 11868 #fd: 30 #threads: 2 priority: 20 nice: 0 state: S
name: 'asm_lms0_+asm1' pid: 24304 #procfdlimit: 65536 cpuusage: 0.60 privmem: 11352 shm: 18712 #fd: 13 #threads: 1 priority: -2 nice: 0 state: S
name: 'ora_vkrm_rac01_' pid: 28506 #procfdlimit: 65536 cpuusage: 0.40 privmem: 2180 shm: 10576 #fd: 12 #threads: 1 priority: 20 nice: 0 state: S
name: 'asm_lmd0_+asm1' pid: 24302 #procfdlimit: 65536 cpuusage: 0.40 privmem: 12204 shm: 19872 #fd: 13 #threads: 1 priority: 20 nice: 0 state: S
name: 'asm_dia0_+asm1' pid: 24298 #procfdlimit: 65536 cpuusage: 0.40 privmem: 12024 shm: 24196 #fd: 16 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_diag_rac01_' pid: 28502 #procfdlimit: 65536 cpuusage: 0.40 privmem: 7980 shm: 11132 #fd: 12 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_lmhb_rac01_' pid: 28530 #procfdlimit: 65536 cpuusage: 0.40 privmem: 2828 shm: 12212 #fd: 13 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_lck0_rac01_' pid: 28570 #procfdlimit: 65536 cpuusage: 0.40 privmem: 3408 shm: 35388 #fd: 13 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_dia0_rac01_' pid: 28512 #procfdlimit: 65536 cpuusage: 0.40 privmem: 13208 shm: 24336 #fd: 15 #threads: 1 priority: 20 nice: 0 state: S
name: 'mdb_o000_-mgmtd' pid: 2688 #procfdlimit: 65536 cpuusage: 0.40 privmem: 2368 shm: 11932 #fd: 9 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_lmon_rac01_' pid: 28514 #procfdlimit: 65536 cpuusage: 0.40 privmem: 17184 shm: 32324 #fd: 13 #threads: 1 priority: 20 nice: 0 state: S
name: 'mdb_smco_-mgmtd' pid: 24896 #procfdlimit: 65536 cpuusage: 0.20 privmem: 2092 shm: 11692 #fd: 7 #threads: 1 priority: 20 nice: 0 state: S
name: 'UsmMonitor' pid: 23353 #procfdlimit: 1024 cpuusage: 0.20 privmem: 0 shm: 0 #fd: 2 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_q00e_rac01_' pid: 28741 #procfdlimit: 65536 cpuusage: 0.20 privmem: 4756 shm: 36972 #fd: 11 #threads: 1 priority: 20 nice: 0 state: S
name: 'asm_lmon_+asm1' pid: 24300 #procfdlimit: 65536 cpuusage: 0.20 privmem: 14312 shm: 27064 #fd: 13 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_p00c_rac01_' pid: 28699 #procfdlimit: 65536 cpuusage: 0.20 privmem: 2028 shm: 8328 #fd: 10 #threads: 1 priority: 20 nice: 0 state: S
name: 'mdb_lgwr_-mgmtd' pid: 24838 #procfdlimit: 65536 cpuusage: 0.20 privmem: 4140 shm: 21876 #fd: 11 #threads: 1 priority: 20 nice: 0 state: S
name: 'mdb_dia0_-mgmtd' pid: 24833 #procfdlimit: 65536 cpuusage: 0.20 privmem: 3908 shm: 15792 #fd: 7 #threads: 1 priority: 20 nice: 0 state: S
name: 'ovmd' pid: 603 #procfdlimit: 1024 cpuusage: 0.20 privmem: 96 shm: 396 #fd: 5 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_p009_rac01_' pid: 28689 #procfdlimit: 65536 cpuusage: 0.20 privmem: 2032 shm: 8376 #fd: 10 #threads: 1 priority: 20 nice: 0 state: S
name: 'init.tfa' pid: 21628 #procfdlimit: 1024 cpuusage: 0.20 privmem: 724 shm: 1220 #fd: 4 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_p004_rac01_' pid: 28676 #procfdlimit: 65536 cpuusage: 0.20 privmem: 2028 shm: 8324 #fd: 10 #threads: 1 priority: 20 nice: 0 state: S
name: 'cssdagent' pid: 23811 #procfdlimit: 65536 cpuusage: 0.60 privmem: 37564 shm: 80744 #fd: 61 #threads: 16 priority: -100 nice: 0 state: S
name: 'kswapd0' pid: 44 #procfdlimit: 1024 cpuusage: 0.00 privmem: 0 shm: 0 #fd: 2 #threads: 1 priority: 20 nice: 0 state: S
name: 'oracle_3224_-mg' pid: 3224 #procfdlimit: 65536 cpuusage: 0.80 privmem: 5692 shm: 562184 #fd: 9 #threads: 1 priority: 20 nice: 0 state: S
name: 'mdb_dbw0_-mgmtd' pid: 24835 #procfdlimit: 65536 cpuusage: 0.00 privmem: 11332 shm: 512752 #fd: 13 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_mman_rac01_' pid: 28498 #procfdlimit: 65536 cpuusage: 0.00 privmem: 2180 shm: 482056 #fd: 11 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_ppa7_rac01_' pid: 28649 #procfdlimit: 65536 cpuusage: 0.20 privmem: 5584 shm: 479744 #fd: 11 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_dbw0_rac01_' pid: 28534 #procfdlimit: 65536 cpuusage: 0.00 privmem: 10552 shm: 406356 #fd: 12 #threads: 1 priority: 20 nice: 0 state: S
name: 'oracle_3216_-mg' pid: 3216 #procfdlimit: 65536 cpuusage: 0.00 privmem: 6264 shm: 351364 #fd: 9 #threads: 1 priority: 20 nice: 0 state: S
name: 'java' pid: 27097 #procfdlimit: 65536 cpuusage: 0.40 privmem: 488384 shm: 6068 #fd: 151 #threads: 102 priority: 20 nice: 0 state: S
name: 'ora_lms1_rac01_' pid: 28524 #procfdlimit: 65536 cpuusage: 0.60 privmem: 16536 shm: 229772 #fd: 11 #threads: 1 priority: -2 nice: 0 state: S
name: 'ora_cjq0_rac01_' pid: 28663 #procfdlimit: 65536 cpuusage: 0.00 privmem: 14556 shm: 226124 #fd: 11 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_lms0_rac01_' pid: 28520 #procfdlimit: 65536 cpuusage: 0.80 privmem: 16496 shm: 220516 #fd: 11 #threads: 1 priority: -2 nice: 0 state: S
name: 'ora_mmon_rac01_' pid: 28556 #procfdlimit: 65536 cpuusage: 0.40 privmem: 8364 shm: 201564 #fd: 15 #threads: 1 priority: 20 nice: 0 state: S
name: 'mdb_mman_-mgmtd' pid: 24823 #procfdlimit: 65536 cpuusage: 0.00 privmem: 2092 shm: 194524 #fd: 9 #threads: 1 priority: 20 nice: 0 state: S
name: 'mdb_mmon_-mgmtd' pid: 24856 #procfdlimit: 65536 cpuusage: 0.00 privmem: 20948 shm: 153272 #fd: 13 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_smon_rac01_' pid: 28542 #procfdlimit: 65536 cpuusage: 0.00 privmem: 4800 shm: 159652 #fd: 12 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_lck1_rac01_' pid: 28532 #procfdlimit: 65536 cpuusage: 0.00 privmem: 4128 shm: 156856 #fd: 10 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_w001_rac01_' pid: 28635 #procfdlimit: 65536 cpuusage: 0.00 privmem: 4720 shm: 128528 #fd: 10 #threads: 1 priority: 20 nice: 0 state: S
name: 'mdb_smon_-mgmtd' pid: 24844 #procfdlimit: 65536 cpuusage: 0.00 privmem: 4572 shm: 123232 #fd: 10 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_w000_rac01_' pid: 28633 #procfdlimit: 65536 cpuusage: 0.00 privmem: 3852 shm: 122216 #fd: 10 #threads: 1 priority: 20 nice: 0 state: S
name: 'oracle_4662_-mg' pid: 4662 #procfdlimit: 65536 cpuusage: 0.00 privmem: 8732 shm: 107292 #fd: 10 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_w002_rac01_' pid: 29643 #procfdlimit: 65536 cpuusage: 0.00 privmem: 3968 shm: 102284 #fd: 10 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_w004_rac01_' pid: 2548 #procfdlimit: 65536 cpuusage: 0.00 privmem: 3648 shm: 94840 #fd: 10 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_lmd0_rac01_' pid: 28516 #procfdlimit: 65536 cpuusage: 0.40 privmem: 14832 shm: 83312 #fd: 11 #threads: 1 priority: 20 nice: 0 state: S
name: 'oracle_29094_ra' pid: 29094 #procfdlimit: 65536 cpuusage: 0.20 privmem: 7764 shm: 86696 #fd: 12 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_w005_rac01_' pid: 26461 #procfdlimit: 65536 cpuusage: 0.20 privmem: 3976 shm: 88140 #fd: 10 #threads: 1 priority: 20 nice: 0 state: S
name: 'oracle_25071_-m' pid: 25071 #procfdlimit: 65536 cpuusage: 0.00 privmem: 5116 shm: 85268 #fd: 11 #threads: 1 priority: 20 nice: 0 state: S
name: 'oracle_30010_-m' pid: 30010 #procfdlimit: 65536 cpuusage: 0.00 privmem: 6348 shm: 83756 #fd: 11 #threads: 1 priority: 20 nice: 0 state: S
name: 'mdb_q004_-mgmtd' pid: 25060 #procfdlimit: 65536 cpuusage: 0.00 privmem: 5532 shm: 81732 #fd: 7 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_lmd1_rac01_' pid: 28518 #procfdlimit: 65536 cpuusage: 0.40 privmem: 14752 shm: 73420 #fd: 11 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_q002_rac01_' pid: 28711 #procfdlimit: 65536 cpuusage: 0.00 privmem: 9756 shm: 74516 #fd: 10 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_w006_rac01_' pid: 26712 #procfdlimit: 65536 cpuusage: 0.00 privmem: 3528 shm: 73032 #fd: 10 #threads: 1 priority: 20 nice: 0 state: S
name: 'mdb_mmnl_-mgmtd' pid: 24860 #procfdlimit: 65536 cpuusage: 0.20 privmem: 2308 shm: 72652 #fd: 9 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_w003_rac01_' pid: 30569 #procfdlimit: 65536 cpuusage: 0.00 privmem: 3792 shm: 71664 #fd: 10 #threads: 1 priority: 20 nice: 0 state: S
name: 'oracle_27203_ra' pid: 27203 #procfdlimit: 65536 cpuusage: 0.00 privmem: 5728 shm: 58184 #fd: 12 #threads: 1 priority: 20 nice: 0 state: S
name: 'crsd.bin' pid: 23906 #procfdlimit: 65536 cpuusage: 4.00 privmem: 68024 shm: 24676 #fd: 262 #threads: 47 priority: 20 nice: 0 state: S
name: 'ora_q004_rac01_' pid: 28715 #procfdlimit: 65536 cpuusage: 0.00 privmem: 5808 shm: 54364 #fd: 12 #threads: 1 priority: 20 nice: 0 state: S
name: 'oracle_26988_ra' pid: 26988 #procfdlimit: 65536 cpuusage: 0.00 privmem: 5872 shm: 53996 #fd: 12 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_qm00_rac01_' pid: 28709 #procfdlimit: 65536 cpuusage: 0.00 privmem: 5000 shm: 52896 #fd: 10 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_w007_rac01_' pid: 15980 #procfdlimit: 65536 cpuusage: 0.00 privmem: 3776 shm: 51372 #fd: 10 #threads: 1 priority: 20 nice: 0 state: S
name: 'oracle_28695_ra' pid: 28695 #procfdlimit: 65536 cpuusage: 0.00 privmem: 4556 shm: 49604 #fd: 14 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_mmnl_rac01_' pid: 28558 #procfdlimit: 65536 cpuusage: 0.00 privmem: 3272 shm: 49672 #fd: 12 #threads: 1 priority: 20 nice: 0 state: S
name: 'oracle_28691_ra' pid: 28691 #procfdlimit: 65536 cpuusage: 0.00 privmem: 5512 shm: 47584 #fd: 14 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_q006_rac01_' pid: 28719 #procfdlimit: 65536 cpuusage: 0.00 privmem: 5176 shm: 47744 #fd: 10 #threads: 1 priority: 20 nice: 0 state: S
name: 'mdb_dbrm_-mgmtd' pid: 24829 #procfdlimit: 65536 cpuusage: 0.00 privmem: 5236 shm: 46936 #fd: 10 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_ckpt_rac01_' pid: 28538 #procfdlimit: 65536 cpuusage: 0.20 privmem: 4512 shm: 45660 #fd: 13 #threads: 1 priority: 20 nice: 0 state: S
name: 'mdb_ckpt_-mgmtd' pid: 24840 #procfdlimit: 65536 cpuusage: 0.00 privmem: 4616 shm: 44304 #fd: 11 #threads: 1 priority: 20 nice: 0 state: S
name: 'ohasd.bin' pid: 23597 #procfdlimit: 65536 cpuusage: 2.60 privmem: 48848 shm: 20988 #fd: 242 #threads: 31 priority: 20 nice: 0 state: S
name: 'ora_rmv1_rac01_' pid: 28566 #procfdlimit: 65536 cpuusage: 1.20 privmem: 2988 shm: 43556 #fd: 8 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_rmv0_rac01_' pid: 28564 #procfdlimit: 65536 cpuusage: 0.80 privmem: 2988 shm: 43536 #fd: 8 #threads: 1 priority: 20 nice: 0 state: S
name: 'orarootagent.bi' pid: 23644 #procfdlimit: 65536 cpuusage: 1.40 privmem: 44300 shm: 20824 #fd: 173 #threads: 24 priority: 20 nice: 0 state: S
name: 'oraagent.bin' pid: 24093 #procfdlimit: 65536 cpuusage: 2.20 privmem: 40068 shm: 22908 #fd: 266 #threads: 34 priority: 20 nice: 0 state: S
name: 'oraagent.bin' pid: 23719 #procfdlimit: 65536 cpuusage: 1.40 privmem: 37496 shm: 18652 #fd: 141 #threads: 20 priority: 20 nice: 0 state: S
name: 'ons' pid: 29963 #procfdlimit: 65536 cpuusage: 0.00 privmem: 5516 shm: 1372 #fd: 17 #threads: 18 priority: 20 nice: 0 state: S
name: 'gipcd.bin' pid: 23768 #procfdlimit: 65536 cpuusage: 5.40 privmem: 30204 shm: 17532 #fd: 198 #threads: 9 priority: 20 nice: 0 state: S
name: 'evmd.bin' pid: 23734 #procfdlimit: 65536 cpuusage: 2.40 privmem: 18364 shm: 16816 #fd: 165 #threads: 9 priority: 20 nice: 0 state: S
name: 'asm_vktm_+asm1' pid: 24284 #procfdlimit: 65536 cpuusage: 3.60 privmem: 1484 shm: 10980 #fd: 11 #threads: 1 priority: -2 nice: 0 state: S
name: 'ora_vktm_rac01_' pid: 28492 #procfdlimit: 65536 cpuusage: 3.40 privmem: 2188 shm: 9476 #fd: 10 #threads: 1 priority: -2 nice: 0 state: S
name: 'mdb_vktm_-mgmtd' pid: 24817 #procfdlimit: 65536 cpuusage: 3.40 privmem: 2100 shm: 10392 #fd: 7 #threads: 1 priority: -2 nice: 0 state: S
name: 'octssd.bin' pid: 23878 #procfdlimit: 65536 cpuusage: 2.00 privmem: 18116 shm: 15580 #fd: 99 #threads: 11 priority: 20 nice: 0 state: S
name: 'orarootagent.bi' pid: 24051 #procfdlimit: 65536 cpuusage: 2.00 privmem: 14164 shm: 14940 #fd: 53 #threads: 10 priority: 20 nice: 0 state: S
name: 'rcu_sched' pid: 10 #procfdlimit: 1024 cpuusage: 0.80 privmem: 0 shm: 0 #fd: 2 #threads: 1 priority: 20 nice: 0 state: S
name: 'mdnsd.bin' pid: 23732 #procfdlimit: 65536 cpuusage: 0.60 privmem: 6260 shm: 11268 #fd: 47 #threads: 3 priority: 20 nice: 0 state: S
name: 'gpnpd.bin' pid: 23749 #procfdlimit: 65536 cpuusage: 0.60 privmem: 18872 shm: 16416 #fd: 118 #threads: 8 priority: 20 nice: 0 state: S
name: 'osysmond.bin' pid: 23900 #procfdlimit: 65536 cpuusage: 2.60 privmem: 34464 shm: 80940 #fd: 108 #threads: 12 priority: -100 nice: 0 state: S
name: 'ocssd.bin' pid: 23822 #procfdlimit: 65536 cpuusage: 3.80 privmem: 101036 shm: 90008 #fd: 253 #threads: 28 priority: -100 nice: 0 state: S
name: 'cssdmonitor' pid: 23796 #procfdlimit: 65536 cpuusage: 0.80 privmem: 36816 shm: 80716 #fd: 61 #threads: 16 priority: -100 nice: 0 state: S
DEVICES:
xvdf ior: 0.000 iow: 0.000 ios: 0 qlen: 0 wait: 0 type: SYS
dm-2 ior: 0.000 iow: 4.002 ios: 1 qlen: 0 wait: 0 type: SYS
dm-1 ior: 0.000 iow: 0.000 ios: 0 qlen: 0 wait: 0 type: SYS
dm-0 ior: 0.000 iow: 0.000 ios: 0 qlen: 0 wait: 0 type: SWAP
xvde ior: 0.000 iow: 0.000 ios: 0 qlen: 0 wait: 0 type: ASM
xvda1 ior: 0.000 iow: 0.000 ios: 0 qlen: 0 wait: 0 type: SYS
xvdc ior: 0.000 iow: 0.000 ios: 0 qlen: 0 wait: 0 type: SYS
xvdb ior: 13.809 iow: 41.826 ios: 5 qlen: 0 wait: 0 type: ASM,OCR,VOTING[ONLINE]
xvda ior: 0.000 iow: 4.002 ios: 1 qlen: 0 wait: 0 type: SYS
xvda2 ior: 0.000 iow: 4.002 ios: 1 qlen: 0 wait: 0 type: SYS
xvdd ior: 54.435 iow: 6.403 ios: 3 qlen: 0 wait: 0 type: ASM
NICS:
lo netrr: 0.201  netwr: 0.201  neteff: 0.402  nicerrors: 0 pktsin: 1  pktsout: 1  errsin: 0  errsout: 0  indiscarded: 0  outdiscarded: 0  inunicast: 1  innonunicast: 0  type: PUBLIC
eth1 netrr: 41.132  netwr: 73.606  neteff: 114.738  nicerrors: 0 pktsin: 73  pktsout: 98  errsin: 0  errsout: 0  indiscarded: 0  outdiscarded: 0  inunicast: 73  innonunicast: 0  type: PRIVATE,ASM latency: <1
eth0 netrr: 0.562  netwr: 0.537  neteff: 1.099  nicerrors: 0 pktsin: 6  pktsout: 3  errsin: 0  errsout: 0  indiscarded: 0  outdiscarded: 0  inunicast: 6  innonunicast: 0  type: PUBLIC

FILESYSTEMS:
mount: /u01 type: xfs total: 47175680 used: 29903604 available: 17272076 used%: 63 ifree%: 99 [GRID_HOME]

PROTOCOL ERRORS:
IPHdrErr: 0 IPAddrErr: 0 IPUnkProto: 0 IPReasFail: 442 IPFragFail: 0 TCPFailedConn: 4000 TCPEstRst: 249227 TCPRetraSeg: 1980 UDPUnkPort: 3066 UDPRcvErr: 22563

These sections are repeated for every timeframe.

Enterprise Manager

Enterprise Manager is able to display the data as charts which is much more intuitive. But when you choose the “Cluster Health Monitor” from the cluster target homepage

CC-Cluster-Health-Monitoring

you will face a login prompt:

CC-Cluster-Health-Monitoring-login

This is maybe not very straight-forward. What Oracle wants to know is, how to connect to the GIMR database. The performance data is inside a PDB, but Enterprise Manager simply needs the DBSNMP user of the CDB.

[oracle@vm101 ~]$ export ORACLE_HOME=/u01/app/grid/12.1.0.2
[oracle@vm101 ~]$ export ORACLE_SID=-MGMTDB
[oracle@vm101 ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Mon Mar 7 11:40:40 2016

Copyright (c) 1982, 2014, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Automatic Storage Management and Advanced Analytics options

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 MYCLUSTER                      READ WRITE NO

SQL> select username, account_status
  2  from dba_users;

USERNAME             ACCOUNT_STATUS
-------------------- --------------------------------
ANONYMOUS            EXPIRED & LOCKED
DBSNMP               EXPIRED & LOCKED
WMSYS                EXPIRED & LOCKED
XDB                  EXPIRED & LOCKED
APPQOSSYS            EXPIRED & LOCKED
GSMADMIN_INTERNAL    EXPIRED & LOCKED
GSMCATUSER           EXPIRED & LOCKED
SYSBACKUP            EXPIRED & LOCKED
OUTLN                EXPIRED & LOCKED
DIP                  EXPIRED & LOCKED
SYSDG                EXPIRED & LOCKED
ORACLE_OCM           EXPIRED & LOCKED
SYSKM                EXPIRED & LOCKED
XS$NULL              EXPIRED & LOCKED
GSMUSER              EXPIRED & LOCKED
AUDSYS               EXPIRED & LOCKED
SYSTEM               OPEN
SYS                  OPEN

19 rows selected.

SQL> alter user dbsnmp account unlock identified by dbsnmp;

User altered.

Now we can see the performance charts in real time:

CC-Cluster-Health-Monitoring-rt

Or historically:

CC-Cluster-Health-Monitoring-hist

Unfortunately there is now drill down to the process level like in the database performance pages. And there is only data for CPU, network and memory, but not disk. Please Oracle, add these features for us. It would help a lot. Thanks.
At least we can see details for each single CPU and network interface per node:

CC-Cluster-Health-Monitoring-13c-CPU

CC-Cluster-Health-Monitoring-13c-network

opatchauto Odyssey

A couple of days ago a customer asked for assistance in installing the January PSU in their RAC environment. The patch should be applied to two systems, first the test cluster, second the production cluster. Makes sense so far. So we planned the steps that needed to be done:

  • Download the patch
  • copy patch to all nodes and extract it
  • check OPatch version
  • create response file for OCM and copy it to all nodes
  • clear ASM adump directory since this may slow down pre-patch steps
  • “opatchauto” first node
  • “opatchauto” second node
  • run “datapatch” to apply SQL to databases

The whole procedure went fine without any issues on test. We even skipped the last step, running “datapatch” since the “opatchauto” did that for us. This happens in contrast to the Readme which does not tell about that.

So that was easy. But unfortunately the production system went not as smooth as the test system. “opatchauto” shut down the cluster stack and patched the RDBMS home successfully. But during the patch phase of GI, the logfile told us that there are still processes that blocked some files. I checked that and found a handful, one of those processes was the “ocssd”. When killing all the left-over processes I knew immediately that this was not the best idea. The server fenced and rebooted straight away. That left my cluster in a fuzzy state. The cluster stack came up again, but “opatchauto -resume” told me, that I should proceed with some manual steps. So I applied the patches to the GI home which was not done before and run the post-patch script which failed. Starting “opatchauto” in normal mode failed also since the cluster was already in “rolling” mode.

So finally I removed all the applied patches manually, put the cluster back in normal mode following MOS Note 1943498.1 and started the whole patching all over.  Everything went fine this time.

Conclusion

  1. Think before you act. Killing OCSSD is not a good idea at all.
  2. In contrast to the Readme “datapatch” is being executed by “opatchauto” as part of the patching process.
  3. Checking the current cluster status can be done like this:
[oracle@vm101 ~]$ crsctl query crs activeversion -f
Oracle Clusterware active version on the cluster is [12.1.0.2.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [3467666221].