Losing the ASM password file

A time ago I wrote about recovering from a lost Grid Inftrastructure Diskgroup. There I described the steps to re-create OCR, voting files, ASM SPfile and the Management DB. But something is missing, the ASM password file. This becomes very important in case you are using Flex ASM.

What it looks like initially

Let’s check what’s inside the passwordfile when everything runs fine.

[oracle@vm140 ~]$ asmcmd pwget --asm
+GI/orapwASM
[oracle@vm140 ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Tue Sep 13 10:13:05 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> select * from v$pwfile_users;

USERNAME                       SYSDB SYSOP SYSAS SYSBA SYSDG SYSKM     CON_ID
------------------------------ ----- ----- ----- ----- ----- ----- ----------
SYS                            TRUE  TRUE  TRUE  FALSE FALSE FALSE          0
CRSUSER__ASM_001               TRUE  FALSE TRUE  FALSE FALSE FALSE          0
ASMSNMP                        TRUE  FALSE FALSE FALSE FALSE FALSE          0

As you can see, the password file is inside ASM and there is not only the SYS user, but also an user named CRSUSER__ASM_001. This one is used to connect to remote ASM instances.

Lose the ASM password file

Losing the ASM password file is quite simple:

[oracle@vm140 ~]$ asmcmd rm +GI/orapwASM

Now, let’s check what happens. First, stop the clusterware on all nodes:

[root@vm140 ~]# crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'vm140'
...
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'vm140' has completed
CRS-4133: Oracle High Availability Services has been stopped.

Starting the CRS stack on first node

Once all nodes are down, I start the CRS stack on one node.

[root@vm140 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

This actually brings up the whole cluster stack, beside the fact that there is an error in the cluster alert.log:

[root@vm140 ~]# tail -f /u01/app/oracle/diag/crs/vm140/crs/trace/alert.log
2016-09-13 10:28:51.634 [CSSDAGENT(2839)]CRS-8500: Oracle Clusterware CSSDAGENT process is starting with operating system process ID 2839
2016-09-13 10:28:51.983 [OCSSD(2850)]CRS-8500: Oracle Clusterware OCSSD process is starting with operating system process ID 2850
2016-09-13 10:28:53.082 [OCSSD(2850)]CRS-1713: CSSD daemon is started in hub mode
2016-09-13 10:28:58.726 [OCSSD(2850)]CRS-1707: Lease acquisition for node vm140 number 1 completed
2016-09-13 10:28:59.829 [OCSSD(2850)]CRS-1605: CSSD voting file is online: AFD:GI; details in /u01/app/oracle/diag/crs/vm140/crs/trace/ocssd.trc.
2016-09-13 10:28:59.874 [OCSSD(2850)]CRS-1672: The number of voting files currently available 1 has fallen to the minimum number of voting files required 1.
2016-09-13 10:29:08.985 [OCSSD(2850)]CRS-1601: CSSD Reconfiguration complete. Active nodes are vm140 .
2016-09-13 10:29:11.181 [OCTSSD(2984)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 2984
2016-09-13 10:29:12.290 [OCTSSD(2984)]CRS-2407: The new Cluster Time Synchronization Service reference node is host vm140.
2016-09-13 10:29:12.291 [OCTSSD(2984)]CRS-2401: The Cluster Time Synchronization Service started on host vm140.
2016-09-13 10:29:20.185 [ORAAGENT(2589)]CRS-5011: Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/u01/app/oracle/diag/crs/vm140/crs/trace/ohasd_oraagent_oracle.trc"
2016-09-13 10:29:46.560 [ORAROOTAGENT(2424)]CRS-5019: All OCR locations are on ASM disk groups [GI], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/u01/app/oracle/diag/crs/vm140/crs/trace/ohasd_orarootagent_root.trc".
2016-09-13 10:30:06.471 [OSYSMOND(3312)]CRS-8500: Oracle Clusterware OSYSMOND process is starting with operating system process ID 3312
2016-09-13 10:30:07.960 [CRSD(3319)]CRS-8500: Oracle Clusterware CRSD process is starting with operating system process ID 3319
2016-09-13 10:30:10.250 [CRSD(3319)]CRS-1012: The OCR service started on node vm140.
2016-09-13 10:30:11.031 [CRSD(3319)]CRS-1201: CRSD started on node vm140.
2016-09-13 10:30:12.216 [ORAAGENT(3414)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 3414
2016-09-13 10:30:12.459 [ORAROOTAGENT(3418)]CRS-8500: Oracle Clusterware ORAROOTAGENT process is starting with operating system process ID 3418
2016-09-13 10:30:12.886 [OLOGGERD(3477)]CRS-8500: Oracle Clusterware OLOGGERD process is starting with operating system process ID 3477
2016-09-13 10:30:13.190 [ORAAGENT(3414)]CRS-5011: Check of resource "_mgmtdb" failed: details at "(:CLSN00007:)" in "/u01/app/oracle/diag/crs/vm140/crs/trace/crsd_oraagent_oracle.trc"
2016-09-13 10:30:13.218 [ORAAGENT(3414)]CRS-5011: Check of resource "ora.proxy_advm" failed: details at "(:CLSN00006:)" in "/u01/app/oracle/diag/crs/vm140/crs/trace/crsd_oraagent_oracle.trc"

In the mentioned trace files are some errors:

2016-09-13 10:29:46.046220 :    GPNP:1319061248: clsgpnp_dbmsGetItem_profile: [at clsgpnp_dbms.c:345] Result: (0) CLSGPNP_OK. (:GPNP00401:)got ASM-Profile.Mode='remote'
2016-09-13 10:29:46.051333 : default:1319061248: Inited LSF context: 0x7f36182b5220
2016-09-13 10:29:46.057196 : CLSCRED:1319061248: clsCredCommonInit: Inited singleton credctx.
2016-09-13 10:29:46.057226 : CLSCRED:1319061248: (:CLSCRED0101:)clsCredDomInitRootDom: Using user given storage context for repository access.
2016-09-13 10:29:46.192178 : USRTHRD:1319061248: {0:0:2} 6425 Error 4 querying length of attr ASM_DISCOVERY_ADDRESS

2016-09-13 10:29:46.209400 : USRTHRD:1319061248: {0:0:2} 6425 Error 4 querying length of attr ASM_STATIC_DISCOVERY_ADDRESS

2016-09-13 10:29:46.424315 : CLSCRED:1319061248: (:CLSCRED1079:)clsCredOcrKeyExists: Obj dom : SYSTEM.credentials.domains.root.ASM.Self.b50a6df0745b7fb4bfc0880a73d8f455.root not found
2016-09-13 10:29:46.424494 : USRTHRD:1319061248: {0:0:2} 6210 Error 4 opening dom root in 0x7f36181de990

2016-09-13 10:29:46.424494*:kgfn.c@6356: kgfnGetNodeType: flags=0x10
2016-09-13 10:29:46.424494*:kgfn.c@6369: kgfnGetNodeType: ntyp=1
2016-09-13 10:29:46.424494*:kgfn.c@4644: kgfnConnect2: kgfnGetBeqData failed
2016-09-13 10:29:46.483454 : default:1319061248: clsCredDomClose: Credctx deleted 0x7f36182dae20
2016-09-13 10:29:46.483454*:kgfn.c@4868: kgfnConnect2: failed to connect
2016-09-13 10:29:46.483454*:kgfn.c@4887: kgfnConnect: conn=(nil)
2016-09-13 10:29:46.483454*:kgfp.c@669: kgfpInitComplete2 hdl=0x7f36180be4f8 conn=0x7f36180be510 ok=0
2016-09-13 10:29:46.483454*:kgfo.c@947: kgfo_kge2slos error stack at kgfoAl06: ORA-15077: could not locate ASM instance serving a required diskgroup

2016-09-13 10:29:46.483454*:kgfo.c@1058: kgfoSaveError: ctx=0x7f36180e7300 hdl=(nil) gph=0x7f3618076c98 ose=0x7f364e9eae20 at kgfo.c:1006
2016-09-13 10:29:46.483454*:kgfo.c@1115: kgfoSaveError: ignoring existing error:
ORA-15077: could not locate ASM instance serving a required diskgroup

But in the end, the connection to the ASM instance works because it is using local BEQ connection:

2016-09-13 10:30:05.806944*:kgfo.c@698: kgfoAllocHandle cached conn=0x7f36180a6a50 magic=0xd31f gp=0x7f3618290378 env_only=0
2016-09-13 10:30:05.806944*:kgfp.c@651: kgfpInitComplete2 hdl=0x7f36180a6a38 magic=0xd31f rmt=0 flags=0x5
2016-09-13 10:30:05.806944*:kgfn.c@4432: kgfnConnect: inst=(null) srvc=+ASM clnt=3 cflags=0x10
2016-09-13 10:30:05.806944*:kgfn.c@6338: kgfnRemoteASM: remote=0
2016-09-13 10:30:05.806944*:kgfn.c@6379: kgfnGetClusType: flags=0x10
2016-09-13 10:30:05.841109 :    GPNP:1319061248: clsgpnp_dbmsGetItem_profile: [at clsgpnp_dbms.c:345] Result: (0) CLSGPNP_OK. (:GPNP00401:)got ASM-Profile.Mode='remote'
2016-09-13 10:30:05.841109*:kgfn.c@6392: kgfnGetClusType: ctyp=3
2016-09-13 10:30:05.841109*:kgfn.c@4504: kgfnConnect: cluster type 3
2016-09-13 10:30:05.841109*:kgfn.c@6356: kgfnGetNodeType: flags=0x10
2016-09-13 10:30:05.841109*:kgfn.c@6369: kgfnGetNodeType: ntyp=1
2016-09-13 10:30:05.841109*:kgfn.c@5266: kgfnGetBeqData: ios=0 inst=NULL flex=1 line 4539
2016-09-13 10:30:05.841109*:kgfn.c@2044: kgfnTgtInit: sid=(null) flags=0x6000
2016-09-13 10:30:05.841109*:kgfn.c@1205: kgfnFindLocalNode: sid=(null) skgp=(nil) flags=0x6000
2016-09-13 10:30:05.841109*:kgfn.c@1018: kgfn_find_node_sid sid=(null) mbrcnt=1 flex=1
2016-09-13 10:30:05.841109*:kgfn.c@1037: kgfn_find_node_side: nodenum_local=1, mbrs=1 max=256
2016-09-13 10:30:05.841109*:kgfn.c@1095: kgfn_find_node_sid: checking node=1 (+ASM1)
  processed=1 memnum=0 buflen=84
2016-09-13 10:30:05.841109*:kgfn.c@1115: kgfn_find_node_sid LOCAL sid=+ASM1 mbr=0
2016-09-13 10:30:05.841109*:kgfn.c@1148: kgfn_find_node_sid sid=(null) ret=1 lclnode=0x1
2016-09-13 10:30:05.841109*:kgfn.c@2207: kgfnTgtDestroy: sid=+ASM1 host=(null) port=0
cstr=(null) asminst=(null) flags=0x100
2016-09-13 10:30:05.841109*:kgfn.c@5327: kgfnGetBeqData: found a local instance
2016-09-13 10:30:05.841109*:kgfn.c@4680: kgfnConnect: srvr valid
2016-09-13 10:30:05.841109*:kgfn.c@4686: kgfnConnect: bequeath connection
2016-09-13 10:30:05.841109*:kgfn.c@5972: kgfnConnect2Int: sysasm=0 envflags=0x10 srvrflags=0x3 unam=NULL password is NULL pstr=_ocr
2016-09-13 10:30:05.841109*:kgfn.c@6134: kgfnConnect2Int: cstr=(DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/u01/app/12.1.0.2/grid/bin/oracle)(ARGV0=oracle+ASM1_ocr)(ENVS='ORACLE_HOME=/u01/app/12.1.0.2/grid,ORACLE_SID=+ASM1')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))')(PRIVS=(USER=oracle)(GROUP=oinstall)))(enable=setuser))
2016-09-13 10:30:05.841109*:kgfn.c@4887: kgfnConnect: conn=0x7f36180a6a50
2016-09-13 10:30:05.841109*:kgfp.c@669: kgfpInitComplete2 hdl=0x7f36180a6a38 conn=0x7f36180a6a50 ok=1
2016-09-13 10:30:06.085577 :kgfn.c@3680: kgfnStmtSingle res=0 [MOUNTED]

Starting the CRS stack on second node

Ok, now start the CRS stack on the second node. This is stuck starting “ora.storage” resource. The cluster alert.log looks similar to the one on the first node:

[root@vm141 ~]# tail -f /u01/app/oracle/diag/crs/vm141/crs/trace/alert.log
2016-09-13 10:35:55.511 [CSSDAGENT(2867)]CRS-8500: Oracle Clusterware CSSDAGENT process is starting with operating system process ID 2867
2016-09-13 10:35:55.843 [OCSSD(2882)]CRS-8500: Oracle Clusterware OCSSD process is starting with operating system process ID 2882
2016-09-13 10:35:56.952 [OCSSD(2882)]CRS-1713: CSSD daemon is started in hub mode
2016-09-13 10:36:02.779 [OCSSD(2882)]CRS-1707: Lease acquisition for node vm141 number 4 completed
2016-09-13 10:36:03.913 [OCSSD(2882)]CRS-1605: CSSD voting file is online: AFD:GI; details in /u01/app/oracle/diag/crs/vm141/crs/trace/ocssd.trc.
2016-09-13 10:36:03.956 [OCSSD(2882)]CRS-1672: The number of voting files currently available 1 has fallen to the minimum number of voting files required 1.
2016-09-13 10:36:05.624 [OCSSD(2882)]CRS-1601: CSSD Reconfiguration complete. Active nodes are vm140 vm141 .
2016-09-13 10:36:07.958 [OCTSSD(3014)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 3014
2016-09-13 10:36:09.066 [OCTSSD(3014)]CRS-2401: The Cluster Time Synchronization Service started on host vm141.
2016-09-13 10:36:09.066 [OCTSSD(3014)]CRS-2407: The new Cluster Time Synchronization Service reference node is host vm140.
2016-09-13 10:36:31.042 [ORAROOTAGENT(2569)]CRS-5019: All OCR locations are on ASM disk groups [GI], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/u01/app/oracle/diag/crs/vm141/crs/trace/ohasd_orarootagent_root.trc".

But the mentioned tracefile does look different this time:

2016-09-13 10:36:30.996133*:kgfo.c@1058: kgfoSaveError: ctx=0x7f83ac121700 hdl=(nil) gph=0x7f83ac0ae9e8 ose=0x7f83c4c63df0 at kgfo.c:1006
2016-09-13 10:36:30.996133*:kgfo.c@1115: kgfoSaveError: ignoring existing error:
ORA-01017: invalid username/password; logon denied
ORA-17503: ksfdopn:2 Failed to open file +GI/orapwasm
ORA-15173: entry 'orapwasm' does not exist in directory '/'
ORA-06512: at line 4
ORA-15077: could not locate ASM instance serving a required diskgroup

2016-09-13 10:36:30.996133*:kgfo.c@817: kgfoFreeHandle ctx=0x7f83ac121700 hdl=0x7f83ac0f8918 conn=0x7f83ac0f8950 disconnect=0
2016-09-13 10:36:30.996133*:kgfo.c@846:   disconnect hdl 0x7f83ac0f8918 (recycling)
2016-09-13 10:36:30.996133*:kgfo.c@2757: Handle Alloc failed - kgfoCheckMount Reconnecting
2016-09-13 10:36:30.996133*:kgfo.c@2846: kgfoCheckMount dg=GI ok=0
2016-09-13 10:36:30.996467 : USRTHRD:3301365504: {0:9:3} -- trace dump on error exit --

2016-09-13 10:36:30.996497 : USRTHRD:3301365504: {0:9:3} Error [kgfoAl06] in [kgfokge] at kgfo.c:2850

2016-09-13 10:36:30.996520 : USRTHRD:3301365504: {0:9:3} ORA-01017: invalid username/password; logon denied
ORA-17503: ksfdopn:2 Failed to open file +GI/orapwasm
ORA-15173: entry 'orapwasm' does not exist in directory

That’s obvious. I deleted the password file and hence it cannot be located and ASM startuo fails.

Creating a new ASM password file

So let’s go back to the first node where everything is running fine and create a new password file:

[oracle@vm140 ~]$ orapwd file=+GI/orapwASM asm=y

Enter password for SYS:
[oracle@vm140 ~]$ asmcmd ls -l +GI/orapwASM
Type      Redund  Striped  Time             Sys  Name
PASSWORD  UNPROT  COARSE   SEP 13 10:00:00  N    orapwASM => +GI/ASM/P

[oracle@vm140 ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Tue Sep 13 10:43:39 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> select * from v$pwfile_users;

USERNAME                       SYSDB SYSOP SYSAS SYSBA SYSDG SYSKM     CON_ID
------------------------------ ----- ----- ----- ----- ----- ----- ----------
SYS                            TRUE  TRUE  FALSE FALSE FALSE FALSE          0

But how do I get this CRS user back in? Like this, and be careful, there is a double-underscore between CRSUSER and ASM_001:

[oracle@vm140 ~]$ asmcmd lspwusr
Username sysdba sysoper sysasm
     SYS   TRUE    TRUE  FALSE

[oracle@vm140 ~]$ asmcmd orapwusr --add CRSUSER__ASM_001
Enter password: ********

[oracle@vm140 ~]$ asmcmd orapwusr --grant sysasm CRSUSER__ASM_001
[oracle@vm140 ~]$ asmcmd orapwusr --grant sysdba CRSUSER__ASM_001
[oracle@vm140 ~]$ asmcmd lspwusr
        Username sysdba sysoper sysasm
             SYS   TRUE    TRUE  FALSE
CRSUSER__ASM_001   TRUE   FALSE   TRUE

Ok, here we go. The CRS user is back again. You may use SQL*Plus or asmcmd to grant privileges or query password file contents. I used both methods as you can see.

Starting the CRS stack on second node, again

Now that I have my ASM password file back again, I restart the CRS stack on the second node:

[root@vm141 ~]# crsctl stop crs -f

[root@vm141 ~]# crsctl start crs 

And check the alert.log:

[root@vm141 ~]# tail -f /u01/app/oracle/diag/crs/vm141/crs/trace/alert.log
2016-09-13 10:53:41.694 [OCTSSD(10211)]CRS-2407: The new Cluster Time Synchronization Service reference node is host vm140.
2016-09-13 10:53:41.700 [OCTSSD(10211)]CRS-2401: The Cluster Time Synchronization Service started on host vm141.
2016-09-13 10:53:57.194 [ORAROOTAGENT(9887)]CRS-5019: All OCR locations are on ASM disk groups [GI], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/u01/app/oracle/diag/crs/vm141/crs/trace/ohasd_orarootagent_root.trc".

The error is still there, but the tracefle now tells different things:

2016-09-13 10:53:54.781688 :   CLSNS:1479268096: clsns_SetTraceLevel:trace level set to 1.
2016-09-13 10:53:54.818703 :    GPNP:1479268096: clsgpnp_dbmsGetItem_profile: [at clsgpnp_dbms.c:345] Result: (0) CLSGPNP_OK. (:GPNP00401:)got ASM-Profile.Mode='remote'
2016-09-13 10:53:54.825001 : default:1479268096: Inited LSF context: 0x7f9f30285b60
2016-09-13 10:53:54.831784 : CLSCRED:1479268096: clsCredCommonInit: Inited singleton credctx.
2016-09-13 10:53:54.831857 : CLSCRED:1479268096: (:CLSCRED0101:)clsCredDomInitRootDom: Using user given storage context for repository access.
2016-09-13 10:53:54.932529 : USRTHRD:1479268096: {0:9:3} 6425 Error 4 querying length of attr ASM_DISCOVERY_ADDRESS

2016-09-13 10:53:54.942595 : USRTHRD:1479268096: {0:9:3} 6425 Error 4 querying length of attr ASM_STATIC_DISCOVERY_ADDRESS

2016-09-13 10:53:55.046462 : CLSCRED:1479268096: (:CLSCRED1079:)clsCredOcrKeyExists: Obj dom : SYSTEM.credentials.domains.root.ASM.Self.b50a6df0745b7fb4bfc0880a73d8f455.root not found
2016-09-13 10:53:55.046657 : USRTHRD:1479268096: {0:9:3} 6210 Error 4 opening dom root in 0x7f9f302d65e0

2016-09-13 10:53:55.046657*:kgfn.c@6356: kgfnGetNodeType: flags=0x10
2016-09-13 10:53:55.046657*:kgfn.c@6369: kgfnGetNodeType: ntyp=1
2016-09-13 10:53:55.046657*:kgfn.c@4644: kgfnConnect2: kgfnGetBeqData failed
2016-09-13 10:53:55.046657*:kgfn.c@4680: kgfnConnect: srvr valid
2016-09-13 10:53:55.046657*:kgfn.c@5972: kgfnConnect2Int: sysasm=0 envflags=0x10 srvrflags=0x1 unam=crsuser__asm_001 password is NOT NULL pstr=_ocr
2016-09-13 10:53:55.046657*:kgfn.c@6121: kgfnConnect2Int: hosts=1
2016-09-13 10:53:55.046657*:kgfn.c@6134: kgfnConnect2Int: cstr=(DESCRIPTION=(TRANSPORT_CONNECT_TIMEOUT=60)(EXPIRE_TIME=1)(LOAD_BALANCE=ON)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.1)(PORT=1526)))(CONNECT_DATA=(SERVICE_NAME=+ASM)))
2016-09-13 10:53:55.046657*:kgfn.c@6200: kgfnConnect2Int: OCISessionBegin failed
2016-09-13 10:53:55.046657*:kgfn.c@1602: kgfnRecordErrPriv: status=-1  at kgfn.c:6284
2016-09-13 10:53:55.046657*:kgfn.c@1648: kgfnRecordErrPriv: 1017 error=ORA-01017: invalid username/password; logon denied

2016-09-13 10:53:55.046657*:kgfn.c@1684: kgfnRecordErrPriv: rec=1
2016-09-13 10:53:57.155070 : default:1479268096: clsCredDomClose: Credctx deleted 0x7f9f302c33b0
2016-09-13 10:53:57.155070*:kgfn.c@4868: kgfnConnect2: failed to connect
2016-09-13 10:53:57.155070*:kgfn.c@4887: kgfnConnect: conn=(nil)
2016-09-13 10:53:57.155070*:kgfp.c@669: kgfpInitComplete2 hdl=0x7f9f300beec8 conn=0x7f9f300beee0 ok=0
2016-09-13 10:53:57.155070*:kgfo.c@947: kgfo_kge2slos error stack at kgfoAl06: ORA-01017: invalid username/password; logon denied
ORA-15077: could not locate ASM instance serving a required diskgroup

Obviously the password I gave the CRSUSER is not correct. When I checked My Oracle Support for those messages, I found How to Restore ASM Password File if Lost ( ORA-01017 ORA-15077 ) (Doc ID 1644005.1). But this note only describes the process of backing up and restoring the ASM password file. That is something I should have done in the frst place. And something that you and I should do at the very beginning of a cluster installation before we are going production.
So I investigated further and found an ODA related note ODA: CRS Could Not Start on Second ODA Node Due to Invalid ASM Credentials for The “crsuser__asm_001” Clusterware User (Doc ID 2139591.1). This one describes how to recover the lost password. It is still there somehow, as a hash in the clusterware wallets.

Recovering the CRSUSER password

Go back to the running node and do all the steps from there. First, query the path where the ASM password is stored:

[oracle@vm140 ~]$ crsctl query credmaint -path ASM/Self -credtype userpass
Path                                           Credtype   ID   Attrs

/ASM/Self/b50a6df0745b7fb4bfc0880a73d8f455     userpass   0    create_time=2016
                                                               -06-10 15:04:13,
                                                               modify_time=2016
                                                               -06-10 15:04:13,
                                                               expiration_time=
                                                               NEVER,bootstrap=
                                                               FALSE

I can use this path to check for the right user and query it’s password:

[oracle@vm140 ~]$ crsctl get credmaint -path /ASM/Self/b50a6df0745b7fb4bfc0880a73d8f455 -credtype userpass -id 0 -attr user -local 
crsuser__asm_001
[oracle@vm140 ~]$ crsctl get credmaint -path /ASM/Self/b50a6df0745b7fb4bfc0880a73d8f455 -credtype userpass -id 0 -attr passwd -local
B50T01O3wZydcz8nIeydae3qRZhUU

Now that I know the password hash, I can use that to set the propper password for my CRSUSER__ASM_001:

[oracle@vm140 ~]$ asmcmd orapwusr --modify CRSUSER__ASM_001
Enter password: *****************************

Start CRS stack on second node, again and again

Finally, stop and start the CRS stack again:

[root@vm141 ~]# crsctl stop crs -f

[root@vm141 ~]# crsctl start crs

Happily it is successful this time.

Remarks

Be sure to have a current backup of your ASM passwordfile (beside all the other tCRS related files) to ensure recoverability.
Note, this effect may also happen when starting the first node. In that case, start ASM manually and then perform the steps to recover the password file.

ORA-1792 in Oracle DB 12c

Today a customer came up with a problem that occured after going from Oracle Database Version 11.2 to 12.1. A query against a view returned the ORA-1792 error indicating that there are more than 1000 columns. But that was simply not true. So when investigating this, we found that the classic syntax works fine but the ANSI join syntax raises this error.
Just follow this simple test case that creates three tables with 501 columns each which are then joined. And please don’t blame me for being vulnerable to SQL injection attacks…

SQL> declare
  2    i number;
  3    j number;
  4  begin
  5    for i in 1..3 loop
  6      execute immediate 'create table t' || trim(to_char(i)) || '(id number)';
  7      for j in 1..500 loop
  8        execute immediate 'alter table t' || trim(to_char(i)) || ' add (s' || trim(to_char(j)) || ' number)';
  9  end loop;
 10    end loop;
 11  end;
 12  /

PL/SQL procedure successfully completed.

SQL> select count(*) from user_tab_columns where table_name like 'T_';

  COUNT(*)
----------
      1503

WARTA @ orge>
WARTA @ orge> select t3.*
  2  from t1, t2, t3
  3  where t1.id = t2.id
  4    and t2.id = t3.id;

no rows selected

SQL> select t3.*
  2  from t1
  3  join t2 on (t1.id = t2.id)
  4  join t3 on (t2.id = t3.id);
join t3 on (t2.id = t3.id)
     *
ERROR at line 4:
ORA-01792: maximum number of columns in a table or view is 1000

A quick research in My Oracle Support revealed this document: Select Statement Throws ORA-01792 Error in 12c (Doc ID 1951689.1).

So the simple workaround is to disable fix control for bug 17376322.

SQL> alter session set "_fix_control"='17376322:OFF';

Session altered.

SQL> select t3.*
  2  from t1
  3  join t2 on (t1.id = t2.id)
  4  join t3 on (t2.id = t3.id);

no rows selected

And voila, the query now works without throwing any errors anymore. Problem solved.