How DB_LOST_WRITE_PROTECT works with Data Guard

Preface

Today I will blog again about a Data Guard topic. There are a couple of best practices out there which one should follow. One of these best practises is enabeling block checking and lost write protection. About the latter there are not many information out there. So that’s why I want to outline the concept and importance of this feature. Actually this post is inspired by a talk that I had during DOAG Conference 2016. I had a presentation about best practices in Data Guard and someone from the audience was asking how that lost write protection actually works.
Basically it is there to detect lost writes, as the parameter clearly states. That means, a write request to the disk was commited an the database is happy with that. But the write did not actually happen for whatever reason. So when the block will be read the next time, it is still in old state, any changed, deleted or added values are not included. The block itself is consistent, it is not corrupted. The DBA will not notice it since there is no error. An error will occur only when you restore the tablespace containing the block and then try to apply the redo stream. The recovery will detect a newer SCN in the redo stream which does not match the blocks SCN. That is the point where it gets tricky.

The test environment

My simple test cases run on a VirtualBox VM with OEL 6.7, Oracle Restart 12.1.0.2 and Oracle Database 12.1.0.2. Primary and Standby run on the same host.
DB_NAME: db12c
DB_UNIQUE_NAME: db12ca
DB_UNIQUE_NAME: db12cb
You will see the names in my SQL prompt to make things clear.

This is the current state of the system:

SYS@db12ca> show parameter lost

NAME                                 TYPE                              VALUE
------------------------------------ --------------------------------- ------------------------------
db_lost_write_protect                string                            NONE

SYS@db12ca> select database_role from v$database;

DATABASE_ROLE
------------------------------------------------
PHYSICAL STANDBY
SYS@db12cb> show parameter lost


NAME                                 TYPE                              VALUE
------------------------------------ --------------------------------- ------------------------------
db_lost_write_protect                string                            NONE


SYS@db12cb> select database_role from v$database;

DATABASE_ROLE
------------------------------------------------
PRIMARY

So “db12cb” is my primary and “db12ca” my standby instance. by the way, that’s why I gave them the suffix “a” and “b” because they may change roles over and over again.

For testing I create a separate tablespace with manual space management. This allows me to specify FREELISTS=1. Otherwise the changes to my data may end up in different blocks which is not what I want for my testing. Beside that, I create an user which I will use for testing and which gets the necessary grants.

SYS@db12cb> create tablespace marco datafile size 100m segment space management manual;

Tablespace created.

SYS@db12cb> create user marco identified by marco default tablespace marco quota unlimited on marco;

User created.

SYS@db12cb> grant create session to marco;

Grant succeeded.

SYS@db12cb> grant create table to marco;

Grant succeeded.

Scenario #1: No Lost Write Detection

The new user can now create a table and insert some data, so let’s do that.

SYS@db12cb> conn marco/marco
Connected.
MARCO@db12cb> create table testtable (id number, txt varchar2(100)) storage (freelists 1);

Table created.

MARCO@db12cb> insert into testtable values (1, 'Test Lost Write Detection - 1');

1 row created.

MARCO@db12cb> commit;

Commit complete.

Now we can identify the block and check if the data is really in there.

SYS@db12cb> select file_name from dba_data_files where tablespace_name='MARCO';

FILE_NAME
------------------------------------------------------------------------------------------------------------------------
/u01/app/oracle/oradata/DB12CB/datafile/o1_mf_marco_d3llm6dd_.dbf

SYS@db12cb> select block_id, blocks from dba_extents where segment_name='TESTTABLE' and owner='MARCO';

  BLOCK_ID     BLOCKS
---------- ----------
       128          8

SYS@db12cb> alter system checkpoint;

System altered.
[oracle@oel6u4 ~]$ dd if=/u01/app/oracle/oradata/DB12CB/datafile/o1_mf_marco_d3llm6dd_.dbf of=myblock.v1 skip=129 count=1 bs=8192
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000162476 s, 50.4 MB/s
[oracle@oel6u4 ~]$ grep Detection myblock.v1
Binary file myblock.v1 matches

Ok, the data is in that block. In the same way I can now check if the DML was successfully applied on the standby.

SYS@db12ca> alter system flush buffer_cache;

System altered.

SYS@db12ca> select name from v$datafile where name like '%marco%';

NAME
--------------------------------------------------------------------------------
/u01/app/oracle/oradata/DB12CA/datafile/o1_mf_marco_d3llm8nt_.dbf
[oracle@oel6u4 ~]$ dd if=/u01/app/oracle/oradata/DB12CA/datafile/o1_mf_marco_d3llm8nt_.dbf of=sbblock.v1 skip=129 count=1 bs=8192
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000662024 s, 12.4 MB/s
[oracle@oel6u4 ~]$ grep Detection sbblock.v1
Binary file sbblock.v1 matches

So everything is fine until now as it should be.
I will now insert another row into the test table, force that change to be written to disk and then clear the buffer cache.

MARCO@db12cb> insert into testtable values (2, 'Oh my god!');

1 row created.

MARCO@db12cb> commit;

Commit complete.

MARCO@db12cb>

MARCO@db12cb> conn / as sysdba
Connected.
SYS@db12cb> alter system checkpoint;

System altered.

SYS@db12cb> alter system flush buffer_cache;

System altered.

Again, check if it was written to disk.

[oracle@oel6u4 ~]$ dd if=/u01/app/oracle/oradata/DB12CB/datafile/o1_mf_marco_d3llm6dd_.dbf of=myblock.v2 skip=129 count=1 bs=8192
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000318304 s, 25.7 MB/s
[oracle@oel6u4 ~]$ grep Detection myblock.v2
Binary file myblock.v2 matches
[oracle@oel6u4 ~]$ grep god myblock.v2
Binary file myblock.v2 matches

Both values that I inserted are on disk now. Just to make sure everything is ok, I check the block on the standby.

[oracle@oel6u4 ~]$ dd if=/u01/app/oracle/oradata/DB12CA/datafile/o1_mf_marco_d3llm8nt_.dbf of=sbblock.v2 skip=129 count=1 bs=8192
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000162124 s, 50.5 MB/s
[oracle@oel6u4 ~]$ grep Detection sbblock.v2
Binary file sbblock.v2 matches
[oracle@oel6u4 ~]$ grep god sbblock.v2
Binary file sbblock.v2 matches

So far, so good. Now comes the funny part. I will simulate a lost write by just putting my first extracted block back in the datafile.

[oracle@oel6u4 ~]$ dd if=myblock.v1 of=/u01/app/oracle/oradata/DB12CB/datafile/o1_mf_marco_d3llm6dd_.dbf seek=129 count=1 bs=8192 conv=notrunc
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000154517 s, 53.0 MB/s

Now let us query the test table and see what’s happening.

MARCO@db12cb> select * from testtable;

        ID
----------
TXT
--------------------------------------------------------------------------------
         1
Test Lost Write Detection - 1

No error, no waring, just the result. But the result set obviously lacks the row from the second insert. And as the block is completely intact and not corrupted, there is no need to raise any error.
So now it is time to do another INSERT.

MARCO@db12cb> insert into testtable values (3, 'Inconsistency!');

1 row created.

That is the point where it comes to light. The redo apply of the standby database detects a redo record which does not match the data block that itself has. It has no other chance as to stop recovery and raise an error in the alert.log.

2016-11-26 09:52:02.752000 +01:00
ERROR: ORA-00600: internal error code, arguments: [3020] recovery detected a data block with invalid SCN raised at location:kcbr_media_ap_1
Checker run found 1 new persistent data failures
Errors in file /u01/app/oracle/diag/rdbms/db12ca/db12ca/trace/db12ca_pr02_2466.trc  (incident=2705):
ORA-00600: internal error code, arguments: [3020], [2], [129], [8388737], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 129, file offset is 1056768 bytes)
ORA-10564: tablespace MARCO
ORA-01110: data file 2: '/u01/app/oracle/oradata/DB12CA/datafile/o1_mf_marco_d3llm8nt_.dbf'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 93368
2016-11-26 09:52:03.882000 +01:00
Incident details in: /u01/app/oracle/diag/rdbms/db12ca/db12ca/incident/incdir_2705/db12ca_pr02_2466_i2705.trc

Beside that, the primary is still running fine, accepts changes, commits and is just doing what a database is supposed to do. This is very unkind since the only way to recover from such a situation is doing a failover to the standby and lose all changes that happened after the change to damaged block. And this can be a lot.

Scenario #2: Lost Write Detection enabled

I enable it by simply setting the parameter to typical on both instances.

SYS@db12ca> alter system set db_lost_write_protect=typical;

System altered.
SYS@db12cb> alter system set db_lost_write_protect=typical;

System altered.

This parameter forces the database to record the SCN of all blocks that it reads from disk to the redo stream. The standby database can use this information to compare the recorded SCN from the redo stream to the actual SCN of the block at the standby site. If there is a difference, it can report a lost write.

Now I walk through the same steps as above. But this time, after simulating the lost write, I simply query the table.

MARCO@db12cb> select * from testtable;

        ID
----------
TXT
--------------------------------------------------------------------------------
         1
Test Lost Write Detection - 1

The SELECT succeeds, but the alert.log of the primary reports the following error.

2016-11-26 10:40:47.143000 +01:00
DMON: A primary database lost write was reported by standby database db12ca. Please look at the alert and DRC logs of the standby database db12ca to see more information about the lost write.

The standby’s alert.log now reports an ORA-752 instead of an ORA-600.

No redo at or after SCN 3448159 can be used for recovery.
PR02: Primary database lost write detected by standby database db12ca
BLOCK THAT LOST WRITE 129, FILE 2, TABLESPACE# 7
The block read during the normal successful database operation had SCN 3346737 (0x0000.00331131) seq 1 (0x01)
ERROR: ORA-00752 detected lost write on primary
Slave exiting with ORA-752 exception
Errors in file /u01/app/oracle/diag/rdbms/db12ca/db12ca/trace/db12ca_pr02_2924.trc:
ORA-00752: recovery detected a lost write of a data block
ORA-10567: Redo is inconsistent with data block (file# 2, block# 129, file offset is 1056768 bytes)
ORA-10564: tablespace MARCO
ORA-01110: data file 2: '/u01/app/oracle/oradata/DB12CA/datafile/o1_mf_marco_d3lnpn8n_.dbf'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 93369
Recovery Slave PR02 previously exited with exception 752
MRP0: Background Media Recovery terminated with error 448
Errors in file /u01/app/oracle/diag/rdbms/db12ca/db12ca/trace/db12ca_pr00_2919.trc:
ORA-00448: normal completion of background process

Recovering from a lost write

As in scenario #1, the only way to work around this error is to failover to the standby database.

[oracle@oel6u4 ~]$ dgmgrl
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys@db12ca
Password:
Connected as SYSDBA.
DGMGRL> failover to db12ca immediate
Performing failover NOW, please wait...
Failover succeeded, new primary is "db12ca"

Now I can query my test table at the new primary.

SYS@db12ca> select * from marco.testtable;

        ID TXT
---------- ------------------------------
         1 Test Lost Write Detection - 1
         2 Oh my god!

I now need to re-create the old primary. Reinstate using Flashback Database will not work. The steps will be basically these:

  • remove database from configuration
  • recreate the database using duplicate
  • add database back to the configuration

A lot of effort for such a “small” failure….

Conclusion

Enabling lost write detection is crucial in a Data Guard setup. Lost writes are detected at read time which allows to perform recovery steps much earlier than without it. Nevertheless, lost writes should not occur. If it does occur, something really bad is going on in your environment and you need to investigate the root cause of the lost write.
That’s it, basically. I hope it makes things a little more clear.

Further reading

Resolving ORA-752 or ORA-600 [3020] During Standby Recovery (Doc ID 1265884.1)
Oracle Docs – Database Reference: DB_LOST_WRITE_PROTECT

DataPatch stuck on RAC – PSU October 2016

Yesterday one of my customers wanted to patch two 2-node clusters with the current PSU October 2016 (161018). Both are running 12.1.0.2 Grid Infrastructure and 12.1.0.2 Database. The servers run SPARC Solaris 10. When applying the patch on the first cluster using “opatchauto” everything went fine until the “trying to apply SQL Patch” part on the 2nd node. So I went to the log directory and found the following:

$ cd $ORACLE_BASE/cfgtoollogs/sqlpatch/sqlpatch_27075_2016_11_30_17_12_08
$ tail sqlpatch_catcon_0.log

SQL> GRANT SELECT ON sys.gv_$instance TO dv_secanalyst
  2  /

At that line it was stuck. Searching My Oracle Support brought up nothing helpful. So I had a look at the database sessions:

SQL> select sid, username, event, state, seconds_in_wait 
2    from v$session where username='SYS';

       SID USERNAME                       EVENT                                                            STATE                                                   SECONDS_IN_WAIT
---------- ------------------------------ ---------------------------------------------------------------- -----------                                    -------- ---------------
        13 SYS                            SQL*Net message from client                                      WAITING                                                             226
        30 SYS                            SQL*Net message from client                                      WAITING                                                             473
        32 SYS                            SQL*Net message to client                                        WAITED SHOR                                    T TIME                 0
       411 SYS                            SQL*Net message from client                                      WAITING                                                             473
       783 SYS                            library cache lock                                               WAITING                                                             211
       786 SYS                            SQL*Net message from client                                      WAITING                                                               4
      1155 SYS                            SQL*Net message from client                                      WAITING                                                             467

The session is waiting for something dictionary related. Since the waiting statement was related to RAC, I stopped the other instance which made sqlplatch continue immediately. So the workaround looked like this:

$ srvctl stop instance -db <dbname> -node <node1>
$ srvctl start instance -db <dbname> -node <node1>

This happened on both clusters. So be aware of that in case you are applying that PSU patch to RAC databases.
In case you missed to stop the 1st instance in time, the GRANT statement will run into a timeout (ORA-4021) and the SQL-Patch will be marked with “ERROR” in DBA_REGISTRY_SQLPATCH. In such case, just re-run “datapatch” again and monitor the logfile.
Happy patching.

Update 07-DEC-2016

I was not able to reproduce this issue on a Linux x86-64 system. So there is a chance that the issue is OS related.

Update 12-DEC-2016

Finally I reproduced this issue on my Linux x86-64 test system. Now I opened a SR for that.

Update 13-DEC-2016

Thanks to a quick and efficient Oracle Support guy (yes, there are such people!) we found the root cause of that issue. There is a bug in the Enterprise Manager Agent (DB Express maybe too) that it holds a shared lock on some GV$ views during the whole lifetime of a session. That’s why datapatch got stuck. If you just stop the Agent, datapatch will continue immediatly. There is no need to stop the whole instance. We just need to get rid of the Agent’s sessions.
Thanks a lot to Prakash from Oracle Support for his engagement in investigating this issue.

Benchmarking Data Guard Throughput

Motivation

When planning a new Oracle Data Guard setup, we do a lot of benchmarking beforehand. We do I/O benchmarks using Orion, or application benchmarks like SwingBench or SLOB. For network bandwith and throughput there is iPerf. But iPerf tells us only half the truth. It just measures the network. But in a Data Guard setup with synchronous redo transport there is more. The receiving site needs to write the information to disk and after successful completion acknowledge back to the sender.

Oracle provides a tool called “OraTCPTest” for that purpose which is pretty similar to iPerf but with additional capability to reflect the behaviour of Data Guard. The tool and basic instructions can be found in My Oracle Support document “Measuring Network Capacity using oratcptest (Doc ID 2064368.1)“.

Basically we start a server process at the receiving site, that optionally can write the received information to disk, and another client process at the sending site. Let’s walk through a basic example. I did this on Windows boxes, it works even there 🙂

1. Start the server process

We just need to specify the port which should be used for listening.

D:\install>java -jar oratcptest.jar -server -port=4711
OraTcpTest server started.

 
There is nothing more than that.

2. Start the client process

Here we need some more parameters, the target, the port and some timing stuff.

D:\install>java -jar oratcptest.jar myotherserver -port=4711 -duration=10s -interval=2s
[Requesting a test]
        Message payload        = 1 Mbyte
        Payload content type   = RANDOM
        Delay between messages = NO
        Number of connections  = 1
        Socket send buffer     = (system default)
        Transport mode         = SYNC
        Disk write             = NO
        Statistics interval    = 2 seconds
        Test duration          = 10 seconds
        Test frequency         = NO
        Network Timeout        = NO
        (1 Mbyte = 1024x1024 bytes)

(15:05:55) The server is ready.
                        Throughput                 Latency
(15:05:57)        101.832 Mbytes/s                9.820 ms
(15:05:59)        102.159 Mbytes/s                9.789 ms
(15:06:01)        102.169 Mbytes/s                9.788 ms
(15:06:03)        101.954 Mbytes/s                9.808 ms
(15:06:05)        101.998 Mbytes/s                9.804 ms
(15:06:05) Test finished.
               Socket send buffer = 64 kbytes
                  Avg. throughput = 101.968 Mbytes/s
                     Avg. latency = 9.807 ms

And that’s it. You can see all the parameters that were used and which values these have. The output then tells us the performance details, the average throughput and latency.

Finding the truth

Assuming we are migrating an existing system, we need to know if the network can handle the amount of redo that needs to be shipped. AWR or Statspack reports tells us these numbers, but they are averages. Peaks might be masked in those reports. Using the information in V$ARCHIVED_LOG is much more accurate.

Let’s look at a timeframe during daytime workload and find the maximum redo rate.

select
  thread#,
  sequence#,
  blocks*block_size/1024/1024 MB,
  (next_time-first_time)*86400 sec,
  (blocks*block_size/1024/1024)/((next_time-first_time)*86400) "MB/s"
from
  v$archived_log
where
  ( (next_time-first_time)*86400<>0)
and
  first_time between
    to_date('2016/11/22 08:00:00','YYYY/MM/DD HH24:MI:SS') and
    to_date('2016/11/22 16:00:00','YYYY/MM/DD HH24:MI:SS')
and
  dest_id=2
order by 5;

   THREAD#  SEQUENCE#         MB        SEC       MB/s
---------- ---------- ---------- ---------- ----------
         1     349010 226.600586        453 .500222044
         1     348976 226.943848        415 .546852645
[...]
         1     349048 240.336914         56 4.29173061
         1     348953  240.79248         53 4.54325435
         1     349049 246.279785         47 5.23999543
         1     348987 228.593262         37 6.17819626

So at peak workloads one redo file of roughly 230MB is written in 37 seconds which results in 6-7 MB/s. In the first simple example we achieved over 100MB/s, so this is easy, isn’t it? No, it isn’t. We will see in a moment. First we need to find out how large a typical redo write is. We can get these values from AWR reports or just query V$SYSTAT.

select
  (select value from v$sysstat where statistic# = (select statistic# from v$statname where name = 'redo size')) /
  (select value from v$sysstat where statistic# = (select statistic# from v$statname where name = 'redo writes')) "avg redo write size"
from
  dual;

avg redo write size
-------------------
         9399.55267

The typical redo write size is 9400 bytes in this case. Now we can start over and do the benchmark again with the propper parameters.

Real Life Benchmark

Again we start the server process, this time we specify a file which is writeable. Since the tool cannot write directly to ASM, I created an ACFS volume to get roughly the performance of my ASM disk that will host the Standby Redologs in the future environment.

D:\install>java -jar oratcptest.jar -server -port=4711 -file=d:\app\oracle\oradata\acfs\oratcptest.tmp
OraTcpTest server started.

Now we can start several tests against this server process. Let’s start with asynchronous transmission without writing the information at the receiving site.

D:\install>java -jar oratcptest.jar myotherserver -port=4711 -duration=10s -interval=2s -mode=async -length=9400
[Requesting a test]
        Message payload        = 9400 bytes
        Payload content type   = RANDOM
        Delay between messages = NO
        Number of connections  = 1
        Socket send buffer     = (system default)
        Transport mode         = ASYNC
        Disk write             = NO
        Statistics interval    = 2 seconds
        Test duration          = 10 seconds
        Test frequency         = NO
        Network Timeout        = NO
        (1 Mbyte = 1024x1024 bytes)

(08:42:53) The server is ready.
                        Throughput
(08:42:55)        112.814 Mbytes/s
(08:42:57)        112.731 Mbytes/s
(08:42:59)        112.641 Mbytes/s
(08:43:01)        112.622 Mbytes/s
(08:43:03)        112.665 Mbytes/s
(08:43:03) Test finished.
               Socket send buffer = 16 Mbytes
                  Avg. throughput = 112.642 Mbytes/s					 

Ok, 10MB/s more than the first test, we are transferring just 9kB per message instead of 1MB wich was the default.
Next test, again asynchronous transfer, but this time the server should write the message content to disk before replying.

D:\install>java -jar oratcptest.jar myotherserver -port=4711 -duration=10s -interval=2s -mode=async -write -length=9400
[Requesting a test]
        Message payload        = 9400 bytes
        Payload content type   = RANDOM
        Delay between messages = NO
        Number of connections  = 1
        Socket send buffer     = (system default)
        Transport mode         = ASYNC
        Disk write             = YES
        Statistics interval    = 2 seconds
        Test duration          = 10 seconds
        Test frequency         = NO
        Network Timeout        = NO
        (1 Mbyte = 1024x1024 bytes)

(08:42:34) The server is ready.
                        Throughput
(08:42:36)         25.230 Mbytes/s
(08:42:38)         26.655 Mbytes/s
(08:42:40)         27.600 Mbytes/s
(08:42:42)         27.578 Mbytes/s
(08:42:44)         27.603 Mbytes/s
(08:42:44) Test finished.
               Socket send buffer = 1 Mbyte
                  Avg. throughput = 26.922 Mbytes/s

The throughput went down massively. Only 27MB/s are left, that is 1/4 of the previous throughput that we achieved without writing the message payload. A massive impact.
So what will happen when we do the transfer in a synchronous manner as Data Guard will do with Maximum Availability? Let’s see.

D:\install>java -jar oratcptest.jar myotherserver -port=4711 -duration=10s -interval=2s -mode=sync -write -length=9400
[Requesting a test]
        Message payload        = 9400 bytes
        Payload content type   = RANDOM
        Delay between messages = NO
        Number of connections  = 1
        Socket send buffer     = (system default)
        Transport mode         = SYNC
        Disk write             = YES
        Statistics interval    = 2 seconds
        Test duration          = 10 seconds
        Test frequency         = NO
        Network Timeout        = NO
        (1 Mbyte = 1024x1024 bytes)

(08:44:28) The server is ready.
                        Throughput                 Latency
(08:44:30)         15.082 Mbytes/s                0.595 ms
(08:44:32)         15.959 Mbytes/s                0.562 ms
(08:44:34)         16.402 Mbytes/s                0.547 ms
(08:44:36)         16.603 Mbytes/s                0.541 ms
(08:44:38)         16.579 Mbytes/s                0.541 ms
(08:44:38) Test finished.
               Socket send buffer = 64 kbytes
                  Avg. throughput = 16.117 Mbytes/s
                     Avg. latency = 0.557 ms

Again a degradation, there are only 16MB/s left. And remember, we’ll need 7MB/s during peak load. And beside that, we get the latency which will later impact the commit performance since the sending (primary) server will wait till the redo is written and acknowledged by the receiving (standby) server.
Just for the sake of completeness, the test with synchronous transfer but without writing.

D:\install>java -jar oratcptest.jar myotherserver -port=4711 -duration=10s -interval=2s -mode=sync -length=9400
[Requesting a test]
        Message payload        = 9400 bytes
        Payload content type   = RANDOM
        Delay between messages = NO
        Number of connections  = 1
        Socket send buffer     = (system default)
        Transport mode         = SYNC
        Disk write             = NO
        Statistics interval    = 2 seconds
        Test duration          = 10 seconds
        Test frequency         = NO
        Network Timeout        = NO
        (1 Mbyte = 1024x1024 bytes)

(08:44:11) The server is ready.
                        Throughput                 Latency
(08:44:13)         23.547 Mbytes/s                0.381 ms
(08:44:15)         31.800 Mbytes/s                0.282 ms
(08:44:17)         33.249 Mbytes/s                0.270 ms
(08:44:19)         32.639 Mbytes/s                0.275 ms
(08:44:21)         31.627 Mbytes/s                0.284 ms
(08:44:21) Test finished.
               Socket send buffer = 64 kbytes
                  Avg. throughput = 30.556 Mbytes/s
                     Avg. latency = 0.294 ms

It is somewhat faster than asnychronous transfer with writing.

Conclusion

Simple network bandwith tests are not sufficient to predict future Data Guard redo transfer performance. The OraTCPTest provides a nice opportunity to measure the throughput and get realistic values that can be used to plan and size the network. Again, here are the numbers of the different tests.

ASYNC nowrite ASYNC write SYNC nowrite SYNC write
112 MB/s 27 MB/s 32 MB/s 16 MB/s

We could now try to tune that more by playing around with send and receive buffer size. But for the time being, the defaults are sufficient for my case. So I skipped that.

AFD label operation throws ORA-15031

Some time ago I wrote about setting up ASM Filter Driver (AFD). There I labeled the disks and then set up the cluster. But now the cluster is live and due to space pressure I needed to add another disk to the system. So I thought this must be straight-forward, just configure multipathing, label the new device and add it to the diskgroup. But it was not as easy as I thought. These were my devices:

# ll /dev/dm*
brw-rw---- 1 root disk 253, 0 Oct 21 13:24 /dev/dm-0
brw-rw---- 1 root disk 253, 1 Oct 21 13:24 /dev/dm-1
brw-rw---- 1 root disk 253, 2 Oct 21 13:24 /dev/dm-2
brw-rw---- 1 root disk 253, 3 Oct 21 13:24 /dev/dm-3
brw-rw---- 1 root disk 253, 4 Oct 21 13:24 /dev/dm-4
brw-rw---- 1 root disk 253, 5 Jun  3 15:11 /dev/dm-5
brw-rw---- 1 root disk 253, 6 Oct 21 13:28 /dev/dm-6

The “dm-6” was my new device, so I tried to label it:

# asmcmd afd_label DATA03 /dev/dm-6
ORA-15227: could not perform label set/clear operation
ORA-15031: disk specification '/dev/dm-6' matches no disks (DBD ERROR: OCIStmtExecute)
ASMCMD-9513: ASM disk label set operation failed.

What? I tried to read and write to and from that device using dd which went fine. I checked the discovery string:

# asmcmd afd_dsget 
AFD discovery string: /dev/dm* 

Looked fine too. Next checked the “afd.conf” file:

# cat /etc/afd.conf
afd_diskstring='/dev/dm*'
afd_filtering=enable

No issues there.
Finally I checked the $ORACLE_HOME/bin directory for files that start with “afd*”. What I found was an executable called “afdtool” whick looked promising:

# afdtool 
Usage:
afdtool -add [-f] <devpath1, [devpath2,..]>  <labelname>
afdtool -delete [-f]  <devicepath | labelname>
afdtool -getdevlist [label] [-nohdr] [-nopath]
afdtool -filter <enable | disable>  <devicepath>
afdtool -rescan [discstr1, discstr2, ...]
afdtool -stop
afdtool -log  [-d <path>][-l <log_level>][-c <log_context>][-s <buf_size>]
              [-m <max_file_sz>][-n <max_file_num>] [-q] [-t] [-h]
afdtool -di <enable | disable | query>

So I gave it a try and it worked!

# afdtool -add /dev/dm-6 DATA03
Device /dev/dm-6 labeled with DATA03
 # afdtool -getdevlist
--------------------------------------------------------------------------------
Label                     Path
================================================================================
OCR                       /dev/dm-0
ARCH01                    /dev/dm-1
DATA01                    /dev/dm-2
GIMR                      /dev/dm-3
DATA02                    /dev/dm-4
DATA03                    /dev/dm-6

“asmcmd afd_lsdsk” returns the same of cause:

# asmcmd afd_lsdsk
--------------------------------------------------------------------------------
Label                     Filtering   Path
================================================================================
OCR                         ENABLED   /dev/dm-0
ARCH01                      ENABLED   /dev/dm-1
DATA01                      ENABLED   /dev/dm-2
GIMR                        ENABLED   /dev/dm-3
DATA02                      ENABLED   /dev/dm-4
DATA03                      ENABLED   /dev/dm-6

So the message is, that when the cluster stack is up and running, you have to use “afdtool”. On the other hand, when the stack is down, then “asmcmd afd_*” is the right choice. Thy to find that in the docs

What is interresting is, that the AFD devices are still owned by “root”:

 # ls -l /dev/oracleafd/disks/
total 24
-rw-r--r-- 1 oracle dba  10 Jun  3 15:11 ARCH01
-rw-r--r-- 1 oracle dba  10 Jun  3 15:11 DATA01
-rw-r--r-- 1 oracle dba  10 Jun  7 21:40 DATA02
-rw-r--r-- 1 root   root 10 Oct 25 11:02 DATA03
-rw-r--r-- 1 oracle dba  10 Jun  3 15:11 GIMR
-rw-r--r-- 1 oracle dba  10 Jun  3 15:11 OCR

But you can already use that disk:

SQL> alter diskgroup data add disk  'AFD:DATA03';

Diskgroup altered.

SQL> select name, header_status, path from v$asm_disk;

NAME                           HEADER_STATU PATH
------------------------------ ------------ --------------------
OCR                            MEMBER       AFD:OCR
ARCH01                         MEMBER       AFD:ARCH01
DATA01                         MEMBER       AFD:DATA01
GIMR                           MEMBER       AFD:GIMR
DATA02                         MEMBER       AFD:DATA02
DATA03                         MEMBER       AFD:DATA03

I don’t know how that works, but it does. The ownership of the device will be set after the next reboot, but take your time 🙂

OTN Appreciation Day: Flashback

Following Tim Hall’s suggestion, I will write about my favorite feature: Flashback.

The term “Flashback” covers different technologies. The most useful ones in my opinion are Flashback Query and Flashback Drop, they become very useful in case of human errors. Those errors are much more likely to occur than errors caused by hardware failures. We are all humans and we all make mistakes. Flashback technologies provide easy methods to recover from those errors in contrast to RMAN features for instance.

In the past I already wrote about that, you can the article here: Flashback – Recover without Recovery

Losing the ASM password file

A time ago I wrote about recovering from a lost Grid Inftrastructure Diskgroup. There I described the steps to re-create OCR, voting files, ASM SPfile and the Management DB. But something is missing, the ASM password file. This becomes very important in case you are using Flex ASM.

What it looks like initially

Let’s check what’s inside the passwordfile when everything runs fine.

[oracle@vm140 ~]$ asmcmd pwget --asm
+GI/orapwASM
[oracle@vm140 ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Tue Sep 13 10:13:05 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> select * from v$pwfile_users;

USERNAME                       SYSDB SYSOP SYSAS SYSBA SYSDG SYSKM     CON_ID
------------------------------ ----- ----- ----- ----- ----- ----- ----------
SYS                            TRUE  TRUE  TRUE  FALSE FALSE FALSE          0
CRSUSER__ASM_001               TRUE  FALSE TRUE  FALSE FALSE FALSE          0
ASMSNMP                        TRUE  FALSE FALSE FALSE FALSE FALSE          0

As you can see, the password file is inside ASM and there is not only the SYS user, but also an user named CRSUSER__ASM_001. This one is used to connect to remote ASM instances.

Lose the ASM password file

Losing the ASM password file is quite simple:

[oracle@vm140 ~]$ asmcmd rm +GI/orapwASM

Now, let’s check what happens. First, stop the clusterware on all nodes:

[root@vm140 ~]# crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'vm140'
...
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'vm140' has completed
CRS-4133: Oracle High Availability Services has been stopped.

Starting the CRS stack on first node

Once all nodes are down, I start the CRS stack on one node.

[root@vm140 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

This actually brings up the whole cluster stack, beside the fact that there is an error in the cluster alert.log:

[root@vm140 ~]# tail -f /u01/app/oracle/diag/crs/vm140/crs/trace/alert.log
2016-09-13 10:28:51.634 [CSSDAGENT(2839)]CRS-8500: Oracle Clusterware CSSDAGENT process is starting with operating system process ID 2839
2016-09-13 10:28:51.983 [OCSSD(2850)]CRS-8500: Oracle Clusterware OCSSD process is starting with operating system process ID 2850
2016-09-13 10:28:53.082 [OCSSD(2850)]CRS-1713: CSSD daemon is started in hub mode
2016-09-13 10:28:58.726 [OCSSD(2850)]CRS-1707: Lease acquisition for node vm140 number 1 completed
2016-09-13 10:28:59.829 [OCSSD(2850)]CRS-1605: CSSD voting file is online: AFD:GI; details in /u01/app/oracle/diag/crs/vm140/crs/trace/ocssd.trc.
2016-09-13 10:28:59.874 [OCSSD(2850)]CRS-1672: The number of voting files currently available 1 has fallen to the minimum number of voting files required 1.
2016-09-13 10:29:08.985 [OCSSD(2850)]CRS-1601: CSSD Reconfiguration complete. Active nodes are vm140 .
2016-09-13 10:29:11.181 [OCTSSD(2984)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 2984
2016-09-13 10:29:12.290 [OCTSSD(2984)]CRS-2407: The new Cluster Time Synchronization Service reference node is host vm140.
2016-09-13 10:29:12.291 [OCTSSD(2984)]CRS-2401: The Cluster Time Synchronization Service started on host vm140.
2016-09-13 10:29:20.185 [ORAAGENT(2589)]CRS-5011: Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/u01/app/oracle/diag/crs/vm140/crs/trace/ohasd_oraagent_oracle.trc"
2016-09-13 10:29:46.560 [ORAROOTAGENT(2424)]CRS-5019: All OCR locations are on ASM disk groups [GI], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/u01/app/oracle/diag/crs/vm140/crs/trace/ohasd_orarootagent_root.trc".
2016-09-13 10:30:06.471 [OSYSMOND(3312)]CRS-8500: Oracle Clusterware OSYSMOND process is starting with operating system process ID 3312
2016-09-13 10:30:07.960 [CRSD(3319)]CRS-8500: Oracle Clusterware CRSD process is starting with operating system process ID 3319
2016-09-13 10:30:10.250 [CRSD(3319)]CRS-1012: The OCR service started on node vm140.
2016-09-13 10:30:11.031 [CRSD(3319)]CRS-1201: CRSD started on node vm140.
2016-09-13 10:30:12.216 [ORAAGENT(3414)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 3414
2016-09-13 10:30:12.459 [ORAROOTAGENT(3418)]CRS-8500: Oracle Clusterware ORAROOTAGENT process is starting with operating system process ID 3418
2016-09-13 10:30:12.886 [OLOGGERD(3477)]CRS-8500: Oracle Clusterware OLOGGERD process is starting with operating system process ID 3477
2016-09-13 10:30:13.190 [ORAAGENT(3414)]CRS-5011: Check of resource "_mgmtdb" failed: details at "(:CLSN00007:)" in "/u01/app/oracle/diag/crs/vm140/crs/trace/crsd_oraagent_oracle.trc"
2016-09-13 10:30:13.218 [ORAAGENT(3414)]CRS-5011: Check of resource "ora.proxy_advm" failed: details at "(:CLSN00006:)" in "/u01/app/oracle/diag/crs/vm140/crs/trace/crsd_oraagent_oracle.trc"

In the mentioned trace files are some errors:

2016-09-13 10:29:46.046220 :    GPNP:1319061248: clsgpnp_dbmsGetItem_profile: [at clsgpnp_dbms.c:345] Result: (0) CLSGPNP_OK. (:GPNP00401:)got ASM-Profile.Mode='remote'
2016-09-13 10:29:46.051333 : default:1319061248: Inited LSF context: 0x7f36182b5220
2016-09-13 10:29:46.057196 : CLSCRED:1319061248: clsCredCommonInit: Inited singleton credctx.
2016-09-13 10:29:46.057226 : CLSCRED:1319061248: (:CLSCRED0101:)clsCredDomInitRootDom: Using user given storage context for repository access.
2016-09-13 10:29:46.192178 : USRTHRD:1319061248: {0:0:2} 6425 Error 4 querying length of attr ASM_DISCOVERY_ADDRESS

2016-09-13 10:29:46.209400 : USRTHRD:1319061248: {0:0:2} 6425 Error 4 querying length of attr ASM_STATIC_DISCOVERY_ADDRESS

2016-09-13 10:29:46.424315 : CLSCRED:1319061248: (:CLSCRED1079:)clsCredOcrKeyExists: Obj dom : SYSTEM.credentials.domains.root.ASM.Self.b50a6df0745b7fb4bfc0880a73d8f455.root not found
2016-09-13 10:29:46.424494 : USRTHRD:1319061248: {0:0:2} 6210 Error 4 opening dom root in 0x7f36181de990

2016-09-13 10:29:46.424494*:kgfn.c@6356: kgfnGetNodeType: flags=0x10
2016-09-13 10:29:46.424494*:kgfn.c@6369: kgfnGetNodeType: ntyp=1
2016-09-13 10:29:46.424494*:kgfn.c@4644: kgfnConnect2: kgfnGetBeqData failed
2016-09-13 10:29:46.483454 : default:1319061248: clsCredDomClose: Credctx deleted 0x7f36182dae20
2016-09-13 10:29:46.483454*:kgfn.c@4868: kgfnConnect2: failed to connect
2016-09-13 10:29:46.483454*:kgfn.c@4887: kgfnConnect: conn=(nil)
2016-09-13 10:29:46.483454*:kgfp.c@669: kgfpInitComplete2 hdl=0x7f36180be4f8 conn=0x7f36180be510 ok=0
2016-09-13 10:29:46.483454*:kgfo.c@947: kgfo_kge2slos error stack at kgfoAl06: ORA-15077: could not locate ASM instance serving a required diskgroup

2016-09-13 10:29:46.483454*:kgfo.c@1058: kgfoSaveError: ctx=0x7f36180e7300 hdl=(nil) gph=0x7f3618076c98 ose=0x7f364e9eae20 at kgfo.c:1006
2016-09-13 10:29:46.483454*:kgfo.c@1115: kgfoSaveError: ignoring existing error:
ORA-15077: could not locate ASM instance serving a required diskgroup

But in the end, the connection to the ASM instance works because it is using local BEQ connection:

2016-09-13 10:30:05.806944*:kgfo.c@698: kgfoAllocHandle cached conn=0x7f36180a6a50 magic=0xd31f gp=0x7f3618290378 env_only=0
2016-09-13 10:30:05.806944*:kgfp.c@651: kgfpInitComplete2 hdl=0x7f36180a6a38 magic=0xd31f rmt=0 flags=0x5
2016-09-13 10:30:05.806944*:kgfn.c@4432: kgfnConnect: inst=(null) srvc=+ASM clnt=3 cflags=0x10
2016-09-13 10:30:05.806944*:kgfn.c@6338: kgfnRemoteASM: remote=0
2016-09-13 10:30:05.806944*:kgfn.c@6379: kgfnGetClusType: flags=0x10
2016-09-13 10:30:05.841109 :    GPNP:1319061248: clsgpnp_dbmsGetItem_profile: [at clsgpnp_dbms.c:345] Result: (0) CLSGPNP_OK. (:GPNP00401:)got ASM-Profile.Mode='remote'
2016-09-13 10:30:05.841109*:kgfn.c@6392: kgfnGetClusType: ctyp=3
2016-09-13 10:30:05.841109*:kgfn.c@4504: kgfnConnect: cluster type 3
2016-09-13 10:30:05.841109*:kgfn.c@6356: kgfnGetNodeType: flags=0x10
2016-09-13 10:30:05.841109*:kgfn.c@6369: kgfnGetNodeType: ntyp=1
2016-09-13 10:30:05.841109*:kgfn.c@5266: kgfnGetBeqData: ios=0 inst=NULL flex=1 line 4539
2016-09-13 10:30:05.841109*:kgfn.c@2044: kgfnTgtInit: sid=(null) flags=0x6000
2016-09-13 10:30:05.841109*:kgfn.c@1205: kgfnFindLocalNode: sid=(null) skgp=(nil) flags=0x6000
2016-09-13 10:30:05.841109*:kgfn.c@1018: kgfn_find_node_sid sid=(null) mbrcnt=1 flex=1
2016-09-13 10:30:05.841109*:kgfn.c@1037: kgfn_find_node_side: nodenum_local=1, mbrs=1 max=256
2016-09-13 10:30:05.841109*:kgfn.c@1095: kgfn_find_node_sid: checking node=1 (+ASM1)
  processed=1 memnum=0 buflen=84
2016-09-13 10:30:05.841109*:kgfn.c@1115: kgfn_find_node_sid LOCAL sid=+ASM1 mbr=0
2016-09-13 10:30:05.841109*:kgfn.c@1148: kgfn_find_node_sid sid=(null) ret=1 lclnode=0x1
2016-09-13 10:30:05.841109*:kgfn.c@2207: kgfnTgtDestroy: sid=+ASM1 host=(null) port=0
cstr=(null) asminst=(null) flags=0x100
2016-09-13 10:30:05.841109*:kgfn.c@5327: kgfnGetBeqData: found a local instance
2016-09-13 10:30:05.841109*:kgfn.c@4680: kgfnConnect: srvr valid
2016-09-13 10:30:05.841109*:kgfn.c@4686: kgfnConnect: bequeath connection
2016-09-13 10:30:05.841109*:kgfn.c@5972: kgfnConnect2Int: sysasm=0 envflags=0x10 srvrflags=0x3 unam=NULL password is NULL pstr=_ocr
2016-09-13 10:30:05.841109*:kgfn.c@6134: kgfnConnect2Int: cstr=(DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/u01/app/12.1.0.2/grid/bin/oracle)(ARGV0=oracle+ASM1_ocr)(ENVS='ORACLE_HOME=/u01/app/12.1.0.2/grid,ORACLE_SID=+ASM1')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))')(PRIVS=(USER=oracle)(GROUP=oinstall)))(enable=setuser))
2016-09-13 10:30:05.841109*:kgfn.c@4887: kgfnConnect: conn=0x7f36180a6a50
2016-09-13 10:30:05.841109*:kgfp.c@669: kgfpInitComplete2 hdl=0x7f36180a6a38 conn=0x7f36180a6a50 ok=1
2016-09-13 10:30:06.085577 :kgfn.c@3680: kgfnStmtSingle res=0 [MOUNTED]

Starting the CRS stack on second node

Ok, now start the CRS stack on the second node. This is stuck starting “ora.storage” resource. The cluster alert.log looks similar to the one on the first node:

[root@vm141 ~]# tail -f /u01/app/oracle/diag/crs/vm141/crs/trace/alert.log
2016-09-13 10:35:55.511 [CSSDAGENT(2867)]CRS-8500: Oracle Clusterware CSSDAGENT process is starting with operating system process ID 2867
2016-09-13 10:35:55.843 [OCSSD(2882)]CRS-8500: Oracle Clusterware OCSSD process is starting with operating system process ID 2882
2016-09-13 10:35:56.952 [OCSSD(2882)]CRS-1713: CSSD daemon is started in hub mode
2016-09-13 10:36:02.779 [OCSSD(2882)]CRS-1707: Lease acquisition for node vm141 number 4 completed
2016-09-13 10:36:03.913 [OCSSD(2882)]CRS-1605: CSSD voting file is online: AFD:GI; details in /u01/app/oracle/diag/crs/vm141/crs/trace/ocssd.trc.
2016-09-13 10:36:03.956 [OCSSD(2882)]CRS-1672: The number of voting files currently available 1 has fallen to the minimum number of voting files required 1.
2016-09-13 10:36:05.624 [OCSSD(2882)]CRS-1601: CSSD Reconfiguration complete. Active nodes are vm140 vm141 .
2016-09-13 10:36:07.958 [OCTSSD(3014)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 3014
2016-09-13 10:36:09.066 [OCTSSD(3014)]CRS-2401: The Cluster Time Synchronization Service started on host vm141.
2016-09-13 10:36:09.066 [OCTSSD(3014)]CRS-2407: The new Cluster Time Synchronization Service reference node is host vm140.
2016-09-13 10:36:31.042 [ORAROOTAGENT(2569)]CRS-5019: All OCR locations are on ASM disk groups [GI], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/u01/app/oracle/diag/crs/vm141/crs/trace/ohasd_orarootagent_root.trc".

But the mentioned tracefile does look different this time:

2016-09-13 10:36:30.996133*:kgfo.c@1058: kgfoSaveError: ctx=0x7f83ac121700 hdl=(nil) gph=0x7f83ac0ae9e8 ose=0x7f83c4c63df0 at kgfo.c:1006
2016-09-13 10:36:30.996133*:kgfo.c@1115: kgfoSaveError: ignoring existing error:
ORA-01017: invalid username/password; logon denied
ORA-17503: ksfdopn:2 Failed to open file +GI/orapwasm
ORA-15173: entry 'orapwasm' does not exist in directory '/'
ORA-06512: at line 4
ORA-15077: could not locate ASM instance serving a required diskgroup

2016-09-13 10:36:30.996133*:kgfo.c@817: kgfoFreeHandle ctx=0x7f83ac121700 hdl=0x7f83ac0f8918 conn=0x7f83ac0f8950 disconnect=0
2016-09-13 10:36:30.996133*:kgfo.c@846:   disconnect hdl 0x7f83ac0f8918 (recycling)
2016-09-13 10:36:30.996133*:kgfo.c@2757: Handle Alloc failed - kgfoCheckMount Reconnecting
2016-09-13 10:36:30.996133*:kgfo.c@2846: kgfoCheckMount dg=GI ok=0
2016-09-13 10:36:30.996467 : USRTHRD:3301365504: {0:9:3} -- trace dump on error exit --

2016-09-13 10:36:30.996497 : USRTHRD:3301365504: {0:9:3} Error [kgfoAl06] in [kgfokge] at kgfo.c:2850

2016-09-13 10:36:30.996520 : USRTHRD:3301365504: {0:9:3} ORA-01017: invalid username/password; logon denied
ORA-17503: ksfdopn:2 Failed to open file +GI/orapwasm
ORA-15173: entry 'orapwasm' does not exist in directory

That’s obvious. I deleted the password file and hence it cannot be located and ASM startuo fails.

Creating a new ASM password file

So let’s go back to the first node where everything is running fine and create a new password file:

[oracle@vm140 ~]$ orapwd file=+GI/orapwASM asm=y

Enter password for SYS:
[oracle@vm140 ~]$ asmcmd ls -l +GI/orapwASM
Type      Redund  Striped  Time             Sys  Name
PASSWORD  UNPROT  COARSE   SEP 13 10:00:00  N    orapwASM => +GI/ASM/P

[oracle@vm140 ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Tue Sep 13 10:43:39 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> select * from v$pwfile_users;

USERNAME                       SYSDB SYSOP SYSAS SYSBA SYSDG SYSKM     CON_ID
------------------------------ ----- ----- ----- ----- ----- ----- ----------
SYS                            TRUE  TRUE  FALSE FALSE FALSE FALSE          0

But how do I get this CRS user back in? Like this, and be careful, there is a double-underscore between CRSUSER and ASM_001:

[oracle@vm140 ~]$ asmcmd lspwusr
Username sysdba sysoper sysasm
     SYS   TRUE    TRUE  FALSE

[oracle@vm140 ~]$ asmcmd orapwusr --add CRSUSER__ASM_001
Enter password: ********

[oracle@vm140 ~]$ asmcmd orapwusr --grant sysasm CRSUSER__ASM_001
[oracle@vm140 ~]$ asmcmd orapwusr --grant sysdba CRSUSER__ASM_001
[oracle@vm140 ~]$ asmcmd lspwusr
        Username sysdba sysoper sysasm
             SYS   TRUE    TRUE  FALSE
CRSUSER__ASM_001   TRUE   FALSE   TRUE

Ok, here we go. The CRS user is back again. You may use SQL*Plus or asmcmd to grant privileges or query password file contents. I used both methods as you can see.

Starting the CRS stack on second node, again

Now that I have my ASM password file back again, I restart the CRS stack on the second node:

[root@vm141 ~]# crsctl stop crs -f

[root@vm141 ~]# crsctl start crs 

And check the alert.log:

[root@vm141 ~]# tail -f /u01/app/oracle/diag/crs/vm141/crs/trace/alert.log
2016-09-13 10:53:41.694 [OCTSSD(10211)]CRS-2407: The new Cluster Time Synchronization Service reference node is host vm140.
2016-09-13 10:53:41.700 [OCTSSD(10211)]CRS-2401: The Cluster Time Synchronization Service started on host vm141.
2016-09-13 10:53:57.194 [ORAROOTAGENT(9887)]CRS-5019: All OCR locations are on ASM disk groups [GI], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/u01/app/oracle/diag/crs/vm141/crs/trace/ohasd_orarootagent_root.trc".

The error is still there, but the tracefle now tells different things:

2016-09-13 10:53:54.781688 :   CLSNS:1479268096: clsns_SetTraceLevel:trace level set to 1.
2016-09-13 10:53:54.818703 :    GPNP:1479268096: clsgpnp_dbmsGetItem_profile: [at clsgpnp_dbms.c:345] Result: (0) CLSGPNP_OK. (:GPNP00401:)got ASM-Profile.Mode='remote'
2016-09-13 10:53:54.825001 : default:1479268096: Inited LSF context: 0x7f9f30285b60
2016-09-13 10:53:54.831784 : CLSCRED:1479268096: clsCredCommonInit: Inited singleton credctx.
2016-09-13 10:53:54.831857 : CLSCRED:1479268096: (:CLSCRED0101:)clsCredDomInitRootDom: Using user given storage context for repository access.
2016-09-13 10:53:54.932529 : USRTHRD:1479268096: {0:9:3} 6425 Error 4 querying length of attr ASM_DISCOVERY_ADDRESS

2016-09-13 10:53:54.942595 : USRTHRD:1479268096: {0:9:3} 6425 Error 4 querying length of attr ASM_STATIC_DISCOVERY_ADDRESS

2016-09-13 10:53:55.046462 : CLSCRED:1479268096: (:CLSCRED1079:)clsCredOcrKeyExists: Obj dom : SYSTEM.credentials.domains.root.ASM.Self.b50a6df0745b7fb4bfc0880a73d8f455.root not found
2016-09-13 10:53:55.046657 : USRTHRD:1479268096: {0:9:3} 6210 Error 4 opening dom root in 0x7f9f302d65e0

2016-09-13 10:53:55.046657*:kgfn.c@6356: kgfnGetNodeType: flags=0x10
2016-09-13 10:53:55.046657*:kgfn.c@6369: kgfnGetNodeType: ntyp=1
2016-09-13 10:53:55.046657*:kgfn.c@4644: kgfnConnect2: kgfnGetBeqData failed
2016-09-13 10:53:55.046657*:kgfn.c@4680: kgfnConnect: srvr valid
2016-09-13 10:53:55.046657*:kgfn.c@5972: kgfnConnect2Int: sysasm=0 envflags=0x10 srvrflags=0x1 unam=crsuser__asm_001 password is NOT NULL pstr=_ocr
2016-09-13 10:53:55.046657*:kgfn.c@6121: kgfnConnect2Int: hosts=1
2016-09-13 10:53:55.046657*:kgfn.c@6134: kgfnConnect2Int: cstr=(DESCRIPTION=(TRANSPORT_CONNECT_TIMEOUT=60)(EXPIRE_TIME=1)(LOAD_BALANCE=ON)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.1)(PORT=1526)))(CONNECT_DATA=(SERVICE_NAME=+ASM)))
2016-09-13 10:53:55.046657*:kgfn.c@6200: kgfnConnect2Int: OCISessionBegin failed
2016-09-13 10:53:55.046657*:kgfn.c@1602: kgfnRecordErrPriv: status=-1  at kgfn.c:6284
2016-09-13 10:53:55.046657*:kgfn.c@1648: kgfnRecordErrPriv: 1017 error=ORA-01017: invalid username/password; logon denied

2016-09-13 10:53:55.046657*:kgfn.c@1684: kgfnRecordErrPriv: rec=1
2016-09-13 10:53:57.155070 : default:1479268096: clsCredDomClose: Credctx deleted 0x7f9f302c33b0
2016-09-13 10:53:57.155070*:kgfn.c@4868: kgfnConnect2: failed to connect
2016-09-13 10:53:57.155070*:kgfn.c@4887: kgfnConnect: conn=(nil)
2016-09-13 10:53:57.155070*:kgfp.c@669: kgfpInitComplete2 hdl=0x7f9f300beec8 conn=0x7f9f300beee0 ok=0
2016-09-13 10:53:57.155070*:kgfo.c@947: kgfo_kge2slos error stack at kgfoAl06: ORA-01017: invalid username/password; logon denied
ORA-15077: could not locate ASM instance serving a required diskgroup

Obviously the password I gave the CRSUSER is not correct. When I checked My Oracle Support for those messages, I found How to Restore ASM Password File if Lost ( ORA-01017 ORA-15077 ) (Doc ID 1644005.1). But this note only describes the process of backing up and restoring the ASM password file. That is something I should have done in the frst place. And something that you and I should do at the very beginning of a cluster installation before we are going production.
So I investigated further and found an ODA related note ODA: CRS Could Not Start on Second ODA Node Due to Invalid ASM Credentials for The “crsuser__asm_001” Clusterware User (Doc ID 2139591.1). This one describes how to recover the lost password. It is still there somehow, as a hash in the clusterware wallets.

Recovering the CRSUSER password

Go back to the running node and do all the steps from there. First, query the path where the ASM password is stored:

[oracle@vm140 ~]$ crsctl query credmaint -path ASM/Self -credtype userpass
Path                                           Credtype   ID   Attrs

/ASM/Self/b50a6df0745b7fb4bfc0880a73d8f455     userpass   0    create_time=2016
                                                               -06-10 15:04:13,
                                                               modify_time=2016
                                                               -06-10 15:04:13,
                                                               expiration_time=
                                                               NEVER,bootstrap=
                                                               FALSE

I can use this path to check for the right user and query it’s password:

[oracle@vm140 ~]$ crsctl get credmaint -path /ASM/Self/b50a6df0745b7fb4bfc0880a73d8f455 -credtype userpass -id 0 -attr user -local 
crsuser__asm_001
[oracle@vm140 ~]$ crsctl get credmaint -path /ASM/Self/b50a6df0745b7fb4bfc0880a73d8f455 -credtype userpass -id 0 -attr passwd -local
B50T01O3wZydcz8nIeydae3qRZhUU

Now that I know the password hash, I can use that to set the propper password for my CRSUSER__ASM_001:

[oracle@vm140 ~]$ asmcmd orapwusr --modify CRSUSER__ASM_001
Enter password: *****************************

Start CRS stack on second node, again and again

Finally, stop and start the CRS stack again:

[root@vm141 ~]# crsctl stop crs -f

[root@vm141 ~]# crsctl start crs

Happily it is successful this time.

Remarks

Be sure to have a current backup of your ASM passwordfile (beside all the other tCRS related files) to ensure recoverability.
Note, this effect may also happen when starting the first node. In that case, start ASM manually and then perform the steps to recover the password file.

ORA-1792 in Oracle DB 12c

Today a customer came up with a problem that occured after going from Oracle Database Version 11.2 to 12.1. A query against a view returned the ORA-1792 error indicating that there are more than 1000 columns. But that was simply not true. So when investigating this, we found that the classic syntax works fine but the ANSI join syntax raises this error.
Just follow this simple test case that creates three tables with 501 columns each which are then joined. And please don’t blame me for being vulnerable to SQL injection attacks…

SQL> declare
  2    i number;
  3    j number;
  4  begin
  5    for i in 1..3 loop
  6      execute immediate 'create table t' || trim(to_char(i)) || '(id number)';
  7      for j in 1..500 loop
  8        execute immediate 'alter table t' || trim(to_char(i)) || ' add (s' || trim(to_char(j)) || ' number)';
  9  end loop;
 10    end loop;
 11  end;
 12  /

PL/SQL procedure successfully completed.

SQL> select count(*) from user_tab_columns where table_name like 'T_';

  COUNT(*)
----------
      1503

WARTA @ orge>
WARTA @ orge> select t3.*
  2  from t1, t2, t3
  3  where t1.id = t2.id
  4    and t2.id = t3.id;

no rows selected

SQL> select t3.*
  2  from t1
  3  join t2 on (t1.id = t2.id)
  4  join t3 on (t2.id = t3.id);
join t3 on (t2.id = t3.id)
     *
ERROR at line 4:
ORA-01792: maximum number of columns in a table or view is 1000

A quick research in My Oracle Support revealed this document: Select Statement Throws ORA-01792 Error in 12c (Doc ID 1951689.1).

So the simple workaround is to disable fix control for bug 17376322.

SQL> alter session set "_fix_control"='17376322:OFF';

Session altered.

SQL> select t3.*
  2  from t1
  3  join t2 on (t1.id = t2.id)
  4  join t3 on (t2.id = t3.id);

no rows selected

And voila, the query now works without throwing any errors anymore. Problem solved.