Last week I was asked to create a Data Guard environment. Quite simple task, you may think. And actually it was, but with some funny side effects. The primary database is running on an Oracle Database Appliance X6-2M using ASM. The Standby database was planned to run on another ODA, a X5-2HA. The X5 is using pure ACFS. Both are running 184.108.40.206.170418 Bundlepatch. Be aware of that, the HA ODA’s are using PSUs whilst the smaller ones are using Bundlepatches. You should not mix up these, so I created another DB Home on the HA with the propper Bundlepatch. With the January ODA Update for the HA versions, Oracle moved to Bundlepatches too, but we are not yet there. So that’s it for the sake of completeness.
So what I did obviously in the first place was duplicating the primary database to the HA ODA. Once that was finished, I wanted to clean up the controlfile and get rid of all those backup and archivelog records and keep just the ones that are really available.
oracle@odax51 ~]$ rman target / Recovery Manager: Release 220.127.116.11.0 - Production on Fri Mar 16 09:11:42 2018 Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved. connected to target database: COMA (DBID=1562414168, not open) RMAN> catalog db_recovery_file_dest; Starting implicit crosscheck backup at 2018-03-16 09:11:44 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 allocated channel: ORA_DISK_2 allocated channel: ORA_DISK_3 allocated channel: ORA_DISK_4 allocated channel: ORA_DISK_5 allocated channel: ORA_DISK_6 allocated channel: ORA_DISK_7 allocated channel: ORA_DISK_8
At this point the RMAN was stuck. A quick look in the alert.log revealed a whole bunch of messages like these:
2018-03-16 09:08:32.000000 +01:00 WARNING: ASMB force dismounting group 3 (RECO) due to missing disks SUCCESS: diskgroup RECO was dismounted NOTE: ASMB mounting group 3 (RECO) NOTE: ASM background process initiating disk discovery for grp 3 (reqid:0) WARNING: group 3 (RECO) has missing disks ORA-15040: diskgroup is incomplete WARNING: group 3 is being dismounted.
The ASM alert.log had corresponding messages:
2018-03-16 09:11:48.567000 +01:00 NOTE: client COMA1:COMA:odax5-c dismounting group 3 (RECO) NOTE: client COMA1:COMA:odax5-c dismounting group 3 (RECO) NOTE: client COMA1:COMA:odax5-c dismounting group 3 (RECO) NOTE: client COMA1:COMA:odax5-c dismounting group 3 (RECO) NOTE: client COMA1:COMA:odax5-c dismounting group 3 (RECO) NOTE: client COMA1:COMA:odax5-c dismounting group 3 (RECO) NOTE: client COMA1:COMA:odax5-c dismounting group 3 (RECO) NOTE: client COMA1:COMA:odax5-c dismounting group 3 (RECO)
Oh sh… you might think, and that was exactly what I thought at that time. So I checked the ASM diskgroups, disks etc. but did not find anything that could be a problem.
So after a while of thinking, the idea came up that it might be related to the backup stuff in the controlfile. So I checked that and tried to unregister a backupiece manually. I used the undocumented DBMS_BACKUP_RESTORE package for that, so do this at your own risk.
SQL> select RECID, STAMP, SET_STAMP, SET_COUNT, HANDLE, PIECE# from v$backup_piece 2 where handle like '+%' and rownum=1; RECID STAMP SET_STAMP SET_COUNT PIECE# HANDLE --------- ---------- ---------- ---------- ------ ---------------------------------------------------------------------------- 129941 969656433 969656431 130820 7 +RECO/COMAX6/BACKUPSET/2018_03_01/nnndn1_tag20180301t210006_0.2815.969656433 SQL> exec dbms_backup_restore.changebackuppiece( - 2 recid => 129941, - 3 stamp => 969656433, - 4 set_stamp => 969656431, - 5 set_count => 130820, - 6 pieceno => 7, - 7 handle => '+RECO/COMAX6/BACKUPSET/2018_03_01/nnndn1_tag20180301t210006_0.2815.969656433', - 8 status => 'D' - 9 );
During the PL/SQL call I saw exact one message like the ones above in the alert.log. That explains te behaviour. During the “catalog” call from RMAN, an implicit crosscheck takes place. Since this tries to access the files in the RECO diskgroup and there is really nothing in that diskgroup except an ACFS volume, this error is being thrown.
That means, I need to get rid of all these records. A simple PL/SQL block helped me doing that.
SQL> set serveroutput on SQL> begin 2 for rec in (select RECID, STAMP, SET_STAMP, SET_COUNT, HANDLE, PIECE# 3 from v$backup_piece 4 where HANDLE like '+%' 5 ) loop 6 dbms_output.put_line('deleting ''' ||rec.handle); 7 dbms_backup_restore.changebackuppiece( 8 recid => rec.recid, 9 stamp => rec.stamp, 10 set_stamp => rec.set_stamp, 11 set_count => rec.set_count, 12 pieceno => rec.piece#, 13 handle => rec.handle, 14 status => 'D' 15 ); 16 end loop; 17 end; 18 /
It took a while and caused again a lot of messages in both, the database and the ASM alert.log, but finally I was able to run RMAN commands successfully again.
Maybe this helps you solve such issues, but be aware of the fact that using DBMS_BACKUP_RESTORE is not supported.