RMAN recovery window over redundancy

One of my favorite topics around the Oracle Database is backup and recovery because it is so essential for successfully operating a database landscape. The topic today is a basic setting in RMAN, that controls the lifetime of backups, the “retention policy”. Let’s get into it.

When planning a backup strategy, there are two major values, that should be taken into account. One is the Recovery Time Objective (RTO) which is the amount of time spent to bring a system back to life after an incident. The second is the Recovery Point Objective (RPO) which represents the point in time to which a system shall be restored to get back to operation. The latter one is mainly influencing our retention policy. Imagine the following situation, a major application update takes place which does a lot of DDL and DML to the database because there are so many cool new features in the new release of that application. But for some reason the update fails when it was halfway through. Of cause it cannot be rolled back easily because of the DDL that took place. So the DBA is asked to restore the database to the time just before the update began.

That is, where the fun part starts. Basically it should be an easy thing using RMAN. Set the point in time, restore the controlfile, restore the database, roll it forward until the point in time and then open the database. So let’s go ahead.

[oracle@vm160 ~]$ rman target /

Recovery Manager: Release 19.0.0.0.0 - Production on Tue Nov 7 08:33:41 2023
Version 19.18.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL19 (DBID=422142402)

RMAN> startup nomount force

Oracle instance started

Total System Global Area    2147482320 bytes

Fixed Size                     9165520 bytes
Variable Size               1342177280 bytes
Database Buffers             788529152 bytes
Redo Buffers                   7610368 bytes

RMAN> set dbid=422142402

executing command: SET DBID

RMAN> run {
2> set until time "to_date('06.11.2023 04:00:00', 'dd.mm.yyyy hh24:mi:ss')";
3> restore controlfile from autobackup;
4> alter database mount;
5> restore database;
6> recover database;
7> }

executing command: SET until clause

Starting restore at 2023-11-07 08:35:08
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=237 device type=DISK

recovery area destination: /u01/app/oracle/fra
database name (or database unique name) used for search: ORCL19
channel ORA_DISK_1: no AUTOBACKUPS found in the recovery area
channel ORA_DISK_1: looking for AUTOBACKUP on day: 20231106
channel ORA_DISK_1: looking for AUTOBACKUP on day: 20231105
channel ORA_DISK_1: looking for AUTOBACKUP on day: 20231104
channel ORA_DISK_1: looking for AUTOBACKUP on day: 20231103
channel ORA_DISK_1: looking for AUTOBACKUP on day: 20231102
channel ORA_DISK_1: looking for AUTOBACKUP on day: 20231101
channel ORA_DISK_1: looking for AUTOBACKUP on day: 20231031
channel ORA_DISK_1: no AUTOBACKUP in 7 days found
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 11/07/2023 08:35:11
RMAN-06172: no AUTOBACKUP found or specified handle is not a valid copy or piece

What? Why don’t I have any backup of my controlfile? Let’s check that.

RMAN> list backup of controlfile summary;


List of Backups
===============
Key     TY LV S Device Type Completion Time     #Pieces #Copies Compressed Tag
------- -- -- - ----------- ------------------- ------- ------- ---------- ---
68      B  F  A DISK        2023-11-07 08:44:01 1       1       NO         TAG20231107T084400

Ok, there is a controlfile backup, but just one. So what are my RMAN settings? Can I see something?

RMAN> show all;

using target database control file instead of recovery catalog
RMAN configuration parameters for database with db_unique_name ORCL19 are:
CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default
CONFIGURE BACKUP OPTIMIZATION OFF; # default
CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default
CONFIGURE CONTROLFILE AUTOBACKUP ON; # default
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F'; # default
CONFIGURE DEVICE TYPE DISK PARALLELISM 1 BACKUP TYPE TO BACKUPSET; # default
CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE MAXSETSIZE TO UNLIMITED; # default
CONFIGURE ENCRYPTION FOR DATABASE OFF; # default
CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default
CONFIGURE COMPRESSION ALGORITHM 'BASIC' AS OF RELEASE 'DEFAULT' OPTIMIZE FOR LOAD TRUE ; # default
CONFIGURE RMAN OUTPUT TO KEEP FOR 7 DAYS; # default
CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/u01/app/oracle/product/db/19/dbs/snapcf_orcl19.f'; # default

The thing is the redundancy=1 as the retention policy. A redundancy of 1 (in words: one) means just what it says, RMAN keeps exactly one backup of each file it backs up. And often I see backup strategies relying highly on redundancy. Say I want to keep 2 weeks of backups for my database. A full backup takes place every sunday, so I set retention policy to “redundancy 2”. That way, RMAN keeps 2 full backups of my database which, according to the backup schedule, will be good for 2 weeks of backups. No, it is not. Why not? For a couple of reasons. First, as shown, the redundancy applies to all types of backups, also controlfile bakups etc. First, to be able to restore the database to the point 2 weeks ago, we’ll need a controlfile backup from that time. And there might be many of those backups. Second, if we change the backup schedule to backup the database twice a week, we change the retention to about one week implicitely. That’s not what I want it to be.

So how do I come around this? The answer is quite simple: Change the retention policy to be a recovery window rather than redundancy.

RMAN> CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 14 DAYS;;

new RMAN configuration parameters:
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 14 DAYS;
new RMAN configuration parameters are successfully stored

With that setting the previous example would have run without any errors. There are many backups of the controlfile:

RMAN> list backup of controlfile summary;


List of Backups
===============
Key     TY LV S Device Type Completion Time     #Pieces #Copies Compressed Tag
------- -- -- - ----------- ------------------- ------- ------- ---------- ---
17      B  F  A DISK        2023-02-24 13:54:24 1       1       NO         TAG20230224T135422
24      B  F  A DISK        2023-10-27 09:13:43 1       1       NO         TAG20231027T091343
26      B  F  A DISK        2023-10-27 09:16:32 1       1       NO         TAG20231027T091632
28      B  F  A DISK        2023-10-28 02:00:06 1       1       NO         TAG20231028T020006
35      B  F  A DISK        2023-10-29 01:00:47 1       1       NO         TAG20231029T010046
37      B  F  A DISK        2023-10-29 02:00:05 1       1       NO         TAG20231029T020005
39      B  F  A DISK        2023-10-30 02:00:08 1       1       NO         TAG20231030T020008
41      B  F  A DISK        2023-10-31 02:00:05 1       1       NO         TAG20231031T020005
43      B  F  A DISK        2023-11-01 02:00:07 1       1       NO         TAG20231101T020007
45      B  F  A DISK        2023-11-02 02:00:06 1       1       NO         TAG20231102T020006
46      B  F  A DISK        2023-11-02 13:28:47 1       1       NO         TAG20231102T132847
47      B  F  A DISK        2023-11-02 13:44:19 1       1       NO         TAG20231102T134419
48      B  F  A DISK        2023-11-02 13:47:53 1       1       NO         TAG20231102T134753
49      B  F  A DISK        2023-11-02 13:57:21 1       1       NO         TAG20231102T135721
52      B  F  A DISK        2023-11-03 02:00:08 1       1       NO         TAG20231103T020008
54      B  F  A DISK        2023-11-04 02:00:07 1       1       NO         TAG20231104T020006
61      B  F  A DISK        2023-11-05 01:00:37 1       1       NO         TAG20231105T010037
63      B  F  A DISK        2023-11-05 02:00:06 1       1       NO         TAG20231105T020006
65      B  F  A DISK        2023-11-06 02:00:05 1       1       NO         TAG20231106T020005
66      B  F  A DISK        2023-11-07 08:24:25 1       1       NO         TAG20231107T082424
67      B  F  A DISK        2023-11-07 02:00:06 1       1       NO         TAG20231107T020006
68      B  F  A DISK        2023-11-07 08:44:01 1       1       NO         TAG20231107T084400

With all these backups I can easily restore the database to the given point in time.

[oracle@vm160 ~]$ rman target /

Recovery Manager: Release 19.0.0.0.0 - Production on Tue Nov 7 08:35:59 2023
Version 19.18.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL19 (not mounted)

RMAN> set dbid=422142402

executing command: SET DBID

RMAN> run {
2> set until time "to_date('06.11.2023 04:00:00', 'dd.mm.yyyy hh24:mi:ss')";
3> restore controlfile from autobackup;
4> alter database mount;
5> restore database;
6> recover database;
7> }

executing command: SET until clause

Starting restore at 2023-11-07 08:36:51
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=20 device type=DISK

recovery area destination: /u01/app/oracle/fra
database name (or database unique name) used for search: ORCL19
channel ORA_DISK_1: AUTOBACKUP /u01/app/oracle/fra/ORCL19/autobackup/2023_11_06/o1_mf_s_1152151205_lnjgnooh_.bkp found in the recovery area
channel ORA_DISK_1: looking for AUTOBACKUP on day: 20231106
channel ORA_DISK_1: restoring control file from AUTOBACKUP /u01/app/oracle/fra/ORCL19/autobackup/2023_11_06/o1_mf_s_1152151205_lnjgnooh_.bkp
channel ORA_DISK_1: control file restore from AUTOBACKUP complete
output file name=/u01/app/oracle/oradata/ORCL19/controlfile/o1_mf_kq6lhlbv_.ctl
output file name=/u01/app/oracle/fra/ORCL19/controlfile/o1_mf_kq6lhlmd_.ctl
Finished restore at 2023-11-07 08:36:53

released channel: ORA_DISK_1
Statement processed

Starting restore at 2023-11-07 08:36:58
Starting implicit crosscheck backup at 2023-11-07 08:36:58
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=20 device type=DISK
Crosschecked 48 objects
Finished implicit crosscheck backup at 2023-11-07 08:36:59

Starting implicit crosscheck copy at 2023-11-07 08:36:59
using channel ORA_DISK_1
Finished implicit crosscheck copy at 2023-11-07 08:36:59

searching for all files in the recovery area
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: /u01/app/oracle/fra/ORCL19/autobackup/2023_11_06/o1_mf_s_1152151205_lnjgnooh_.bkp
File Name: /u01/app/oracle/fra/ORCL19/autobackup/2023_11_07/o1_mf_s_1152260664_lnmsk92o_.bkp
File Name: /u01/app/oracle/fra/ORCL19/autobackup/2023_11_07/o1_mf_s_1152237606_lnm30pow_.bkp

using channel ORA_DISK_1

skipping datafile 5; already restored to file /u01/app/oracle/oradata/ORCL19/datafile/o1_mf_system_kq6m8z4c_.dbf
skipping datafile 6; already restored to file /u01/app/oracle/oradata/ORCL19/datafile/o1_mf_sysaux_kq6m8z7g_.dbf
skipping datafile 8; already restored to file /u01/app/oracle/oradata/ORCL19/datafile/o1_mf_undotbs1_kq6m8zgv_.dbf
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00001 to /u01/app/oracle/oradata/ORCL19/datafile/o1_mf_system_kq6l8vf4_.dbf
channel ORA_DISK_1: restoring datafile 00003 to /u01/app/oracle/oradata/ORCL19/datafile/o1_mf_sysaux_kq6lbyd0_.dbf
channel ORA_DISK_1: restoring datafile 00004 to /u01/app/oracle/oradata/ORCL19/datafile/o1_mf_undotbs1_kq6ld1qb_.dbf
channel ORA_DISK_1: restoring datafile 00007 to /u01/app/oracle/oradata/ORCL19/datafile/o1_mf_users_kq6ld4xm_.dbf
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/backup/orcl19/ORCL19_20231105_data_1s2am3om_1_1
channel ORA_DISK_1: piece handle=/u01/app/oracle/backup/orcl19/ORCL19_20231105_data_1s2am3om_1_1 tag=TAG20231105T010006
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:15
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00019 to /u01/app/oracle/oradata/ORCL19/ED7F830814134AADE053A024100A5D08/datafile/system_umbenannt.dbf
channel ORA_DISK_1: restoring datafile 00020 to /u01/app/oracle/oradata/ORCL19/ED7F830814134AADE053A024100A5D08/datafile/o1_mf_sysaux_kzgxy7n2_.dbf
channel ORA_DISK_1: restoring datafile 00021 to /u01/app/oracle/oradata/ORCL19/ED7F830814134AADE053A024100A5D08/datafile/o1_mf_undotbs1_kzgxy7n4_.dbf
channel ORA_DISK_1: restoring datafile 00022 to /u01/app/oracle/oradata/ORCL19/ED7F830814134AADE053A024100A5D08/datafile/o1_mf_users_kzgxy7n5_.dbf
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/backup/orcl19/ORCL19_20231105_data_1t2am3p6_1_1
channel ORA_DISK_1: piece handle=/u01/app/oracle/backup/orcl19/ORCL19_20231105_data_1t2am3p6_1_1 tag=TAG20231105T010006
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00015 to /u01/app/oracle/oradata/ORCL19/EF608A6519330511E053A024100A91E2/datafile/o1_mf_system_ks5pktn8_.dbf
channel ORA_DISK_1: restoring datafile 00016 to /u01/app/oracle/oradata/ORCL19/EF608A6519330511E053A024100A91E2/datafile/o1_mf_sysaux_ks5pktn9_.dbf
channel ORA_DISK_1: restoring datafile 00017 to /u01/app/oracle/oradata/ORCL19/EF608A6519330511E053A024100A91E2/datafile/o1_mf_undotbs1_ks5pktnb_.dbf
channel ORA_DISK_1: restoring datafile 00023 to /u01/app/oracle/oradata/ORCL19/EF608A6519330511E053A024100A91E2/datafile/o1_mf_users_ln761vy3_.dbf
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/backup/orcl19/ORCL19_20231105_data_1u2am3pd_1_1
channel ORA_DISK_1: piece handle=/u01/app/oracle/backup/orcl19/ORCL19_20231105_data_1u2am3pd_1_1 tag=TAG20231105T010006
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07
Finished restore at 2023-11-07 08:37:29

Starting recover at 2023-11-07 08:37:29
using channel ORA_DISK_1

starting media recovery

archived log for thread 1 with sequence 7 is already on disk as file /u01/app/oracle/fra/ORCL19/onlinelog/o1_mf_1_kq6lkxbr_.log
archived log for thread 1 with sequence 8 is already on disk as file /u01/app/oracle/fra/ORCL19/onlinelog/o1_mf_2_kq6lk9px_.log
channel ORA_DISK_1: starting archived log restore to default destination
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=4
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/backup/orcl19/ORCL19_20231105_arch_202am3pk_1_1
channel ORA_DISK_1: piece handle=/u01/app/oracle/backup/orcl19/ORCL19_20231105_arch_202am3pk_1_1 tag=TAG20231105T010035
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
archived log file name=/u01/app/oracle/fra/ORCL19/archivelog/2023_11_07/o1_mf_1_4_lnmt9trc_.arc thread=1 sequence=4
channel default: deleting archived log(s)
archived log file name=/u01/app/oracle/fra/ORCL19/archivelog/2023_11_07/o1_mf_1_4_lnmt9trc_.arc RECID=268 STAMP=1152261450
channel ORA_DISK_1: starting archived log restore to default destination
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=5
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/backup/orcl19/ORCL19_20231105_arch_222am794_1_1
channel ORA_DISK_1: piece handle=/u01/app/oracle/backup/orcl19/ORCL19_20231105_arch_222am794_1_1 tag=TAG20231105T020004
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
archived log file name=/u01/app/oracle/fra/ORCL19/archivelog/2023_11_07/o1_mf_1_5_lnmt9w16_.arc thread=1 sequence=5
channel default: deleting archived log(s)
archived log file name=/u01/app/oracle/fra/ORCL19/archivelog/2023_11_07/o1_mf_1_5_lnmt9w16_.arc RECID=269 STAMP=1152261452
channel ORA_DISK_1: starting archived log restore to default destination
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=6
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/backup/orcl19/ORCL19_20231106_arch_242aorl4_1_1
channel ORA_DISK_1: piece handle=/u01/app/oracle/backup/orcl19/ORCL19_20231106_arch_242aorl4_1_1 tag=TAG20231106T020004
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
archived log file name=/u01/app/oracle/fra/ORCL19/archivelog/2023_11_07/o1_mf_1_6_lnmt9yvg_.arc thread=1 sequence=6
channel default: deleting archived log(s)
archived log file name=/u01/app/oracle/fra/ORCL19/archivelog/2023_11_07/o1_mf_1_6_lnmt9yvg_.arc RECID=270 STAMP=1152261455
archived log file name=/u01/app/oracle/fra/ORCL19/onlinelog/o1_mf_1_kq6lkxbr_.log thread=1 sequence=7
archived log file name=/u01/app/oracle/fra/ORCL19/onlinelog/o1_mf_2_kq6lk9px_.log thread=1 sequence=8
media recovery complete, elapsed time: 00:00:07
Finished recover at 2023-11-07 08:37:42

Setting the RMAN retention policy to any redundancy is a really bad idea. It might work out for the backup of the datafiles, but it will definitely cause issues with the backups of controlfiles.

So typically we have a requirement for the RTO. We should always use the given value for setting the RMAN retention policy to a recovery windows so that RMAN can keep track of all the necessary backups and is able to keep them as long as they might be required.

Restore a PDB into another CDB

In my previous post I already wrote about a special restore scenario with the Multitenant Architecture. Today the story continues with another scenario that sounds quite simple in the first place. I will be using Oracle Database Version 19.18 for this topic.

So let’s get started. The goal is to restore a given PDB into another CDB at a specific point in time. Imagine some accidental data change happend which should be rolled back. Since only a small and rarely used table is affected, flashback or a complete Point-in-time Restore is not an option. The goal is to restore the PDB into another CDB at the given point in time to analyze the issue further. So let’s get started.

Instantiating a PDB in another CDB sounds like cloning at a first glance. But cloning always uses the running PDB as a source, there is no option to specify a point in time. So let’s try RMAN PDB duplication.

[oracle@odax5ha0 ~]$ rman

Recovery Manager: Release 19.0.0.0.0 - Production on Tue Mar 21 10:08:34 2023
Version 19.16.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

RMAN> connect target sys/********@odax5ha0:1521/mpmcdb2x5.support.robotron.de

connected to target database: MPMCDB2 (DBID=889235828)

RMAN> connect auxiliary sys/********@odax5ha0:1521/mpmcdb1x5.support.robotron.de

connected to auxiliary database: MPMCDB1 (DBID=1547213674)

RMAN> duplicate pluggable database pdba as pdbclone  until time "to_date('21.03.2023 09:10:00')";

Starting Duplicate PDB at 2023-03-21 10:10:28
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate PDB command at 03/21/2023 10:10:28
RMAN-05501: aborting duplication of target database
RMAN-05649: duplicate pluggable database does not allow the use of UNTIL SCN or TO RESTORE POINT clause

Oh yes, good start. And RMAN tells the truth, the docs tell us that.

As in the preivous post, I need an temporary auxiliary CDB instance to restore my PDB. Good news, this time the steps will be a little easier. I start with a Pfile derived from the target CDB.

SQL> create pfile='/tmp/initaux.ora' from spfile;

File created.

I remove the “control_files” parameter and set the “db_unique_name” to something unique. Having that I can startup my auxiliary instance and create a SPfile for it.

[oracle@odax5ha0 ~]$ export ORACLE_SID=CDBDUP

[oracle@odax5ha0 ~]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Tue Mar 21 10:14:22 2023
Version 19.16.0.0.0

Copyright (c) 1982, 2022, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount pfile=/tmp/initaux.ora
ORACLE instance started.

Total System Global Area 4294963760 bytes
Fixed Size                  9142832 bytes
Variable Size             805306368 bytes
Database Buffers         3456106496 bytes
Redo Buffers               24408064 bytes

SQL> create spfile from pfile='/tmp/initaux.ora';

File created.

SQL> startup nomount force
ORACLE instance started.

Total System Global Area 4287959600 bytes
Fixed Size                  9142832 bytes
Variable Size             838860800 bytes
Database Buffers         3422552064 bytes
Redo Buffers               17403904 bytes

SQL>

Now i can go ahead and try the duplication again, this time at CDB level because this allows me to specify a point in time.

[oracle@odax5ha0 ~]$ rman

Recovery Manager: Release 19.0.0.0.0 - Production on Tue Mar 21 10:28:43 2023
Version 19.16.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

RMAN> connect target sys/********@odax5ha0:1521/mpmcdb1x5.support.robotron.de

connected to target database: MPMCDB1 (DBID=1547213674)

RMAN> connect auxiliary /

connected to auxiliary database (not started)

RMAN> duplicate database to CDBDUP pluggable database pdba until time "to_date('21.03.2023 09:10:00', 'dd.mm.yyyy hh24:mi:ss')";

Starting Duplicate Db at 2023-03-21 10:29:44
using target database control file instead of recovery catalog
allocated channel: ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: SID=26 device type=DISK
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 03/21/2023 10:29:48
RMAN-05501: aborting duplication of target database
RMAN-06617: UNTIL TIME (2023-03-21 09:10:00) is ahead of last NEXT TIME in archived logs (2023-03-21 09:00:09)

Oh, look, another error. This is because there was not much DML happening and no log switch occured between the given point in time and right now. But this is easy to work around.

RMAN> alter system archive log current;

Statement processed

RMAN> duplicate database to CDBDUP pluggable database pdba until time "to_date('21.03.2023 09:10:00', 'dd.mm.yyyy hh24:mi:ss')";

Starting Duplicate Db at 2023-03-21 10:30:25
using channel ORA_AUX_DISK_1
duplicating Online logs to Oracle Managed File (OMF) location
duplicating Datafiles to Oracle Managed File (OMF) location

[...]

sql statement: alter pluggable database all open
Dropping offline and skipped tablespaces
Executing: alter database default tablespace system
Executing: drop tablespace "USERS" including contents cascade constraints
Finished Duplicate Db at 2023-03-21 10:34:50

RMAN>

The rest of the work is to unplug the PDB from the temporary auxiliary PDB, drop the auxiliary CDB and finally plug the PDB into the target CDB.

[oracle@odax5ha0 ~]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Tue Mar 21 10:41:30 2023
Version 19.16.0.0.0

Copyright (c) 1982, 2022, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.16.0.0.0

SQL> select name from v$database;

NAME
---------
CDBDUP

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDBA                           READ WRITE NO
SQL> alter pluggable database pdba close immediate;

Pluggable database altered.

SQL> alter pluggable database pdba unplug into '/tmp/pdba.xml';

Pluggable database altered.

SQL> drop pluggable database pdba keep datafiles;

Pluggable database dropped.

SQL> startup mount restrict exclusive force
ORACLE instance started.

Total System Global Area 4287959600 bytes
Fixed Size                  9142832 bytes
Variable Size             838860800 bytes
Database Buffers         3422552064 bytes
Redo Buffers               17403904 bytes
Database mounted.
SQL> drop database;

Database dropped.

Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.16.0.0.0

[oracle@odax5ha0 ~]$ export ORACLE_SID=MPMCDB2
[oracle@odax5ha0 ~]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Tue Mar 21 10:44:12 2023
Version 19.16.0.0.0

Copyright (c) 1982, 2022, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.16.0.0.0

SQL> select name from v$database;

NAME
---------
MPMCDB2

SQL> create pluggable database PDBA using '/tmp/pdba.xml' copy tempfile reuse;

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDBB                           READ WRITE NO
SQL> create pluggable database PDBA using '/tmp/pdba.xml' copy;

Pluggable database created.

SQL> alter pluggable database pdba open;

Pluggable database altered.

SQL> alter pluggable database pdba save state;

Pluggable database altered.

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDBB                           READ WRITE NO
         4 PDBA                           READ WRITE NO

SQL>

And that’s just it. Easier than the other situation but not quite straight forward anyway. I think, this mght help in some situation.

Restore a dropped Pluggable Database

When reading the headline, the todays topic sounds not very complicated or special at all. But be warned, it definitely is. During a Migration-to-Multitenant project at a customers site we walked through quite a couple of restore scenarios. The most obvious scenario is dropping a PDB by accident. Of cause we have all the backups in place, that we will need to restore and recover the lost PDB.

Let’s start with what I did to the PDB:

[oracle@vm160 ~]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Feb 23 13:54:28 2023
Version 19.18.0.0.0

Copyright (c) 1982, 2022, Oracle.  All rights reserved.

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.18.0.0.0

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           READ WRITE NO
         4 ORCL19PDB3                     READ WRITE NO
         5 ORCL19PDB2                     READ WRITE NO
SQL> alter session set container=pdb1;

Session altered.

SQL> select count(*) from hr.employees;

  COUNT(*)
----------
       108

SQL> alter session set container=cdb$root;

Session altered.

SQL> alter pluggable database pdb1 close immediate;

Pluggable database altered.

SQL> drop pluggable database pdb1 including datafiles;

Pluggable database dropped.

That was the easiest part of the story. You see, there were 3 PDBs, the dropped PDB had a table “EMPLOYEES” with 108 records. Now the simple goal ist to restore that PDB to right before the drop. To get that timestamp, we simply look it up in the alert.log:

[oracle@vm160 ~]$ tail /u01/app/oracle/diag/rdbms/orcl19/orcl19/trace/alert_orcl19.log
2023-02-23T13:55:01.355396+01:00
alter pluggable database pdb1 close immediate
2023-02-23T13:55:01.357622+01:00
PDB1(3):Pluggable database PDB1 closing
PDB1(3):JIT: pid 9707 requesting stop
PDB1(3):Closing sequence subsystem (3457133901505).
PDB1(3):Buffer Cache flush started: 3
2023-02-23T13:55:02.353510+01:00
PDB1(3):Buffer Cache flush finished: 3
Pluggable database PDB1 closed
Completed: alter pluggable database pdb1 close immediate
2023-02-23T13:55:14.816955+01:00
drop pluggable database pdb1 including datafiles
2023-02-23T13:55:17.338832+01:00
Deleted Oracle managed file /u01/app/oracle/oradata/ORCL19/ED7F830814134AADE053A024100A5D08/datafile/o1_mf_users_ksfwjspy_.dbf
Deleted Oracle managed file /u01/app/oracle/oradata/ORCL19/ED7F830814134AADE053A024100A5D08/datafile/o1_mf_temp_kq6nz8kv_.dbf
Deleted Oracle managed file /u01/app/oracle/oradata/ORCL19/ED7F830814134AADE053A024100A5D08/datafile/o1_mf_undotbs1_kq6nz8kv_.dbf
Deleted Oracle managed file /u01/app/oracle/oradata/ORCL19/ED7F830814134AADE053A024100A5D08/datafile/o1_mf_sysaux_kq6nz8kt_.dbf
Deleted Oracle managed file /u01/app/oracle/oradata/ORCL19/ED7F830814134AADE053A024100A5D08/datafile/o1_mf_system_kq6nz8kh_.dbf
2023-02-23T13:55:17.616908+01:00
Stopped service pdb1
Completed: drop pluggable database pdb1 including datafiles

Now that we identified the propper timestamp which is roughly 13:55, let’s start with RMAN:

[oracle@vm160 ~]$ rman target /

Recovery Manager: Release 19.0.0.0.0 - Production on Thu Feb 23 13:57:26 2023
Version 19.18.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL19 (DBID=422142402)

RMAN> run {
2> set until time "to_date('23.02.2023 13:55:00', 'dd.mm.yyyy hh24:mi:ss')";
3> restore pluggable database pdb1;
4> recover pluggable database pdb1;
5> }

executing command: SET until clause

Starting restore at 2023-02-23 13:58:23
using target database control file instead of recovery catalog
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 02/23/2023 13:58:24
ORA-01403: no data found

RMAN> restore pluggable database pdb1;

Starting restore at 2023-02-23 13:58:31
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=277 device type=DISK
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 02/23/2023 13:58:31
RMAN-06813: could not translate pluggable database PDB1

Ok, neither can we do a point-in-time restore nor can we do a complete and current restore of the dropped PDB. This is because the controlfile is our single point of truth and it simply forgot everything about the PDB when I dropped it.

What we need to restore the PDB is an auxiliary instance to restore and recover the lost PDB. MOS Note “How to Restore – Dropped Pluggable database (PDB) in Multitenant (Doc ID 2034953.1)” is using DUPPLICATE for that, but this requires to have all neccessary backups in one location. So I chose a different approach. My auxiliary instance will be based on the original one, so I create a Pfile from it, that I will change according to my needs:

  • choose a new db_unique_name
  • remove “control_files” parameter
  • leave create-file-destinations unchanged
  • change memory parameters to minimum values
SQL> create pfile='/tmp/initaux.ora' from spfile;

File created.
*.audit_file_dest='/u01/app/oracle/admin/orcl19/adump'
*.audit_trail='db'
*.compatible='19.0.0'
*.container_data='ALL'
*.db_block_size=8192
*.db_create_file_dest='/u01/app/oracle/oradata'
*.db_domain='support.robotron.de'
*.db_name='orcl19'
*.db_unique_name='orcl19aux'
*.db_recovery_file_dest='/u01/app/oracle/fra'
*.db_recovery_file_dest_size=10494m
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=orcl19XDB)'
*.enable_pluggable_database=true
*.nls_language='AMERICAN'
*.nls_territory='AMERICA'
*.open_cursors=300
*.pga_aggregate_target=364m
*.processes=300
*.remote_login_passwordfile='EXCLUSIVE
*.sec_return_server_release_banner=TRUE
*.sga_target=2048m
*.undo_tablespace='UNDOTBS1'

Keeping the values for file destinations will restore the datafiles in their original paths in case of user-defined filenames. In case of OMF we will get new subdirectories according to the new db_unique_name. Either way the files will be created in the filesystem the original database is using.

With this Pfile I can now start my auxiliary instance and create a SPfile. Having a SPfile is handy because RMAN will update the “control_files” parameter accordingly when restoring the controlfiles. Of cause I have to set the ORACLE_SID to some value other than the original CDB.

[oracle@vm160 ~]$ export ORACLE_SID=aux
[oracle@vm160 ~]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Feb 23 14:22:46 2023
Version 19.18.0.0.0

Copyright (c) 1982, 2022, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount pfile=/tmp/initaux.ora
ORACLE instance started.

Total System Global Area 2147482320 bytes
Fixed Size                  9165520 bytes
Variable Size             486539264 bytes
Database Buffers         1644167168 bytes
Redo Buffers                7610368 bytes

SQL> create spfile from pfile='/tmp/initaux.ora';

File created.

SQL> startup nomount force
ORACLE instance started.

Total System Global Area 2147482320 bytes
Fixed Size                  9165520 bytes
Variable Size             486539264 bytes
Database Buffers         1644167168 bytes
Redo Buffers                7610368 bytes

Having that, I can now restore and recover my PDB using RMAN. Note, since we only restore the one PDB, we have to skip all the other PDBs and the PDB$SEED during recovery.

[oracle@vm160 ~]$ rman target /

Recovery Manager: Release 19.0.0.0.0 - Production on Thu Feb 23 14:27:46 2023
Version 19.18.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL19 (not mounted)

RMAN> set DBID 422142402

executing command: SET DBID

RMAN> run {
1      set until time "to_date('23.02.2023 13:55:00', 'dd.mm.yyyy hh24;mi:ss')";
2      set newname for database to new;
3      restore controlfile from '/u01/app/oracle/fra/ORCL19/autobackup/2023_02_23/o1_mf_s_1129557030_kzgr8qj7_.bkp';
4      alter database mount;
5      restore database root pluggable database pdb1;
6      switch datafile all;
7      switch tempfile all;
8      recover database skip forever tablespace
9      "ORCL19PDB3":"SYSTEM","ORCL19PDB3":"SYSAUX","ORCL19PDB3":"UNDOTBS1","ORCL19PDB2":"SYSTEM","ORCL19PDB2":"SYSAUX","ORCL19PDB2":"UNDOTBS1","PDB$SEED":"SYSTEM","PDB$SEED":"SYSAUX","PDB$SEED":"UNDOTBS1";
10 }

executing command: SET until clause

executing command: SET NEWNAME

Starting restore at 2023-02-23 14:51:50
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=256 device type=DISK

[...]

Starting recover at 2023-02-23 15:08:47
using channel ORA_DISK_1

Executing: alter database datafile 5, 6, 8 offline drop
Executing: alter database datafile 12, 13, 14 offline drop
Executing: alter database datafile 15, 16, 17 offline drop
starting media recovery

archived log for thread 1 with sequence 164 is already on disk as file /u01/app/oracle/fra/ORCL19/archivelog/2023_02_23/o1_mf_1_164_kzgr5rdp_.arc
archived log for thread 1 with sequence 165 is already on disk as file /u01/app/oracle/fra/ORCL19/onlinelog/o1_mf_3_kq6lk1ty_.log
archived log file name=/u01/app/oracle/fra/ORCL19/archivelog/2023_02_23/o1_mf_1_164_kzgr5rdp_.arc thread=1 sequence=164
archived log file name=/u01/app/oracle/fra/ORCL19/onlinelog/o1_mf_3_kq6lk1ty_.log thread=1 sequence=165
media recovery complete, elapsed time: 00:00:06
Finished recover at 2023-02-23 15:08:58

In case you have several PDBs to skip or if you do not like to type all the tablespaces by hand, you can use this handy SQL on the original database to generate the list of tablespaces to skip:

SQL> select listagg('"'||p.name||'":"'||t.tablespace_name||'"', ',')
  2  from v$pdbs p, cdb_tablespaces t
  3  where p.con_id=t.con_id
  4  and t.contents <> 'TEMPORARY'
SQL> /

LISTAGG('"'||P.NAME||'":"'||T.TABLESPACE_NAME||'"',',')
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
"ORCL19PDB3":"SYSTEM","ORCL19PDB3":"SYSAUX","ORCL19PDB3":"UNDOTBS1","ORCL19PDB2":"SYSTEM","ORCL19PDB2":"SYSAUX","ORCL19PDB2":"UNDOTBS1","PDB$SEED":"SYSTEM","PDB$SEED":"SYSAUX","PDB$SEED":"UNDOTBS1"

At this point we could open the auxiliary instance using the “open resetlogs” clause. But wait, we restored the controlfile from the original database so we have the same filenames for the redo members. And we are on the same host. Clearing those redo files might not be the best option. It would simply overwrite the redo files of our original database and we would end up doing another incomplete recovery… My solution is to create a new controlfile with modified redo members. As a basis I backup the controlfile to trace, change the db_name right away and restart my auxiliary instance to nomount.

SQL> alter database backup controlfile to trace as '/tmp/create_ctrl_aux.sql' resetlogs;

Database altered.

SQL> alter system set db_name="AUX" scope=spfile;

System altered.

SQL> startup nomount force
ORACLE instance started.

Total System Global Area 2147482320 bytes
Fixed Size                  9165520 bytes
Variable Size             486539264 bytes
Database Buffers         1644167168 bytes
Redo Buffers                7610368 bytes

Then I modified the “create controlfile” statement

  • set the new database name
  • remove all the files, that I didn’t restore
  • remove redo file names (OMF) – or – change redo file names

The final statement is this:

CREATE CONTROLFILE REUSE set DATABASE "AUX" RESETLOGS  ARCHIVELOG
    MAXLOGFILES 16
    MAXLOGMEMBERS 3
    MAXDATAFILES 1024
    MAXINSTANCES 8
    MAXLOGHISTORY 292
LOGFILE
  GROUP 1 SIZE 200M BLOCKSIZE 512,
  GROUP 2  SIZE 200M BLOCKSIZE 512,
  GROUP 3 SIZE 200M BLOCKSIZE 512
-- STANDBY LOGFILE
DATAFILE
  '/u01/app/oracle/oradata/ORCL19AUX/datafile/o1_mf_system_kzgvw29p_.dbf',
  '/u01/app/oracle/oradata/ORCL19AUX/datafile/o1_mf_sysaux_kzgvw291_.dbf',
  '/u01/app/oracle/oradata/ORCL19AUX/datafile/o1_mf_undotbs1_kzgvw2b9_.dbf',
  '/u01/app/oracle/oradata/ORCL19AUX/datafile/o1_mf_users_kzgvw2f8_.dbf',
  '/u01/app/oracle/oradata/ORCL19AUX/ED7F830814134AADE053A024100A5D08/datafile/o1_mf_system_kzgw69jf_.dbf',
  '/u01/app/oracle/oradata/ORCL19AUX/ED7F830814134AADE053A024100A5D08/datafile/o1_mf_sysaux_kzgw69jz_.dbf',
  '/u01/app/oracle/oradata/ORCL19AUX/ED7F830814134AADE053A024100A5D08/datafile/o1_mf_undotbs1_kzgw69kb_.dbf',
  '/u01/app/oracle/oradata/ORCL19AUX/ED7F830814134AADE053A024100A5D08/datafile/o1_mf_users_kzgw69h9_.dbf'
CHARACTER SET AL32UTF8
;

Finally, I create a new controlfile, open the auxiliary database and unplug the PDB keeping their datafiles.

SQL> CREATE CONTROLFILE REUSE set DATABASE "AUX" RESETLOGS  ARCHIVELOG
[...]

Control file created.

SQL> alter database open resetlogs;

Database altered.

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       MOUNTED
         3 PDB1                           MOUNTED
         4 ORCL19PDB3                     MOUNTED
         5 ORCL19PDB2                     MOUNTED
SQL> alter pluggable database PDB1 unplug into '/tmp/pdb1.xml';

Pluggable database altered.

SQL> drop pluggable database PDB1 keep datafiles;

Pluggable database dropped.

At this point we can transfer the restored PDB back to its original container. You can decide, if you want to use the restored datafiles right away or if you create another copy of them. Either way it is a simple one-liner.

SQL> create pluggable database PDB1 using '/tmp/pdb1.xml' copy;

Pluggable database created.

Or

SQL> create pluggable database PDB1 using '/tmp/pdb1.xml' nocopy tempfile reuse;

Pluggable database created.

Last step is to open the PDB again and check it’s contents.

SQL> alter  pluggable database PDB1 open;

Pluggable database altered.

SQL> alter  pluggable database PDB1 save state;

Pluggable database altered.

SQL> alter session set container=pdb1;

Session altered.

SQL> select count(*) from hr.employees;

  COUNT(*)
----------
       108

Et voila, missing accomplished. The very last step now is removing the auxiliary instance.

SQL> startup mount restrict exclusive force

SQL> drop database;

Whoever said, Multitenant will make DBA’s life easier did not do such a restore 🙂 … But to be fair, there are lot of scenarios where Multitenant is very handy, I must admit.

Insufficient SGA with RMAN automatic auxiliary instances

RMAN is the way to go when it comes to backup and restore. The tool was introduced with Release 8 of the Oracle Database, as far as I remember. And since then, it was heavily improved over time. One of the newer features is the native support for table and tablespace point-in-time recovery. It used to be a manual process in the past, we needed to restore a set of tablespaces to an auxiliary instance, recover it and then transfer the table or the tablespace in question to the source database using Datapump for instance.

This pocess was enhanced and now RMAN can handly the neccessary auxiliary instance on it’s own. It creates it, uses it for restore and recover and drops it afterwards. No magic here. But there is one thing to keep in mind. The auxiliary instance inherits some settings from the target database, the SGA_TARGET parameter for instance. I tested it with Release 19c and several RU’s, it was always the same but may change in the future. Let’s create a testcase, I have a database, create a new tablespace TEST_TS and put a table SOURCE_COPY in that tablespace, just to have it non-empty.

[odaadmin@odax5-base-00 ~]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Wed Jun 22 08:58:03 2022
Version 19.14.0.0.0

Copyright (c) 1982, 2021, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.14.0.0.0

SQL> create tablespace test_ts datafile size 1g;

Tablespace created.

SQL> create table system.source_copy tablespace test_ts as select * from dba_source;

Table created.

Then I create a backup of cause. After creating that backup, I simply drop the tablespace TEST_TS and check the alert.log for the timestamp when the tablespace was dropped. That will be my Recovery Point Objective for the tablespace point-in-time restore.

SQL> drop tablespace test_ts including contents and datafiles;

Tablespace dropped.

SQL> exit

[odaadmin@odax5-base-00 ~]$ tail -f alert.log
2022-06-22 09:01:16.838000 +02:00
drop tablespace test_ts including contents and datafiles
2022-06-22 09:01:22.601000 +02:00
Deleted Oracle managed file /u02/app/oracle/oradata/MPMX5/MPMX5/datafile/o1_mf_test_ts_kc5hd4vf_.dbf
Completed: drop tablespace test_ts including contents and datafiles

Now I try to perform the RMAN tablespace point-in-time recovery. There are several options to that command, see the documentation, but for my case I just specify the location for the datafiles of my automatic auxiliary instance.

[odaadmin@odax5-base-00 ~]$ rman target /

Recovery Manager: Release 19.0.0.0.0 - Production on Wed Jun 22 09:05:53 2022
Version 19.14.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: MPM (DBID=3407247611)

RMAN> recover tablespace test_ts
2> until time "to_date('22.06.2022 09:01:00', 'dd.mm.yyyy hh24:mi:ss')"
3> auxiliary destination '/u02/app/oracle/oradata/MPMX5/MPMX5/datafile';

Starting recover at 2022-06-22 09:05:56
using target database control file instead of recovery catalog
current log archived
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=216 instance=MPM1 device type=DISK
RMAN-05026: warning: presuming following set of tablespaces applies to specified point-in-time

List of tablespaces expected to have UNDO segments
Tablespace SYSTEM
Tablespace MPMPDB1:SYSTEM
Tablespace UNDOTBS1
Tablespace MPMPDB1:UNDOTBS1
Tablespace UNDOTBS2
Tablespace MPMPDB1:UNDO_2

Creating automatic instance, with SID='qAph'

initialization parameters used for automatic instance:
db_name=MPM
db_unique_name=qAph_pitr_MPM
compatible=19.0.0.0
db_block_size=8192
db_files=1024
diagnostic_dest=/u01/app/odaorabase/odaadmin
_system_trig_enabled=FALSE
sga_target=98074M
processes=200
db_create_file_dest=/u02/app/oracle/oradata/MPMX5/MPMX5/datafile
log_archive_dest_1='location=/u02/app/oracle/oradata/MPMX5/MPMX5/datafile'
enable_pluggable_database=true
_clone_one_pdb_recovery=true
#No auxiliary parameter file used


starting up automatic instance MPM

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 06/22/2022 09:06:35
RMAN-04014: startup failed: ORA-27106: system pages not available to allocate memory
Additional information: 6187
Additional information: 1

So the recovery fails, because the server lacks RAM to create the auxiliary instance. Let’s check the SGA of the target instance and the RAM.

[odaadmin@odax5-base-00 ~]$ sqlplus / as sysdba
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.14.0.0.0

SQL> show parameter sga

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
allow_group_access_to_sga            boolean     FALSE
lock_sga                             boolean     FALSE
pre_page_sga                         boolean     TRUE
sga_max_size                         big integer 4G
sga_min_size                         big integer 0
sga_target                           big integer 96G
unified_audit_sga_queue_size         integer     1048576

SQL> !free -h
              total        used        free      shared  buff/cache   available
Mem:           251G        147G         77G        3.4G         27G         93G
Swap:           23G          0B         23G

So there’s just not enough memory available (or hugepages in my case) to startup the auxiliary instance. This is, because RMAN chooses the same SGA size for the auxiliary instance. This is documented behaviour, by the way. Beside the fact that Oracle states, it is usualy not neceessary to override the default, I stumbled quite a lot over this. Now, what can we do to override the defaults? There are two ways:

  1. Create a custom initialization parameter file for the auxiliary instance named “params_auxinst.ora” and place it in $ORACLE_HOME/rdbms/admin
  2. Create a custom initialization parameter file for the auxiliary instance and explicitly point RMAN to that file

For me, the latter one is the better approach because it is more obvious when logging the RMAN commands for documentation purposes for instance. Let’s see.

[odaadmin@odax5-base-00 ~]$ cat /tmp/initRMAN.ora
sga_target=4g

[odaadmin@odax5-base-00 ~]$ rman target /

Recovery Manager: Release 19.0.0.0.0 - Production on Wed Jun 22 09:05:53 2022
Version 19.14.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: MPM (DBID=3407247611)

RMAN> set auxiliary instance parameter file to '/tmp/initRMAN.ora';

executing command: SET auxiliary parameter file
using target database control file instead of recovery catalog

RMAN> recover tablespace test_ts
2> until time "to_date('22.06.2022 09:01:00', 'dd.mm.yyyy hh24:mi:ss')"
3> auxiliary destination '/u02/app/oracle/oradata/MPMX5/MPMX5/datafile';

Starting recover at 2022-06-22 09:35:08
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=222 instance=MPM1 device type=DISK
RMAN-05026: warning: presuming following set of tablespaces applies to specified point-in-time

List of tablespaces expected to have UNDO segments
Tablespace SYSTEM
Tablespace MPMPDB1:SYSTEM
Tablespace UNDOTBS1
Tablespace MPMPDB1:UNDOTBS1
Tablespace UNDOTBS2
Tablespace MPMPDB1:UNDO_2

Creating automatic instance, with SID='pjtg'
using contents of file /tmp/initRMAN.ora

initialization parameters used for automatic instance:
db_name=MPM
db_unique_name=pjtg_pitr_MPM
compatible=19.0.0.0
db_block_size=8192
db_files=1024
diagnostic_dest=/u01/app/odaorabase/odaadmin
_system_trig_enabled=FALSE
sga_target=96G
processes=200
db_create_file_dest=/u02/app/oracle/oradata/MPMX5/MPMX5/datafile
log_archive_dest_1='location=/u02/app/oracle/oradata/MPMX5/MPMX5/datafile'
enable_pluggable_database=true
_clone_one_pdb_recovery=true
ifile=/tmp/initRMAN.ora


starting up automatic instance MPM

Oracle instance started

Total System Global Area    4294963752 bytes

Fixed Size                     9142824 bytes
Variable Size                855638016 bytes
Database Buffers            3388997632 bytes
Redo Buffers                  41185280 bytes
Automatic instance created


List of tablespaces that have been dropped from the target database:
Tablespace TEST_TS

contents of Memory Script:
{
# set requested point in time
set until  time "to_date('22.06.2022 09:01:00', 'dd.mm.yyyy hh24:mi:ss')";
# restore the controlfile
restore clone controlfile;

# mount the controlfile
sql clone 'alter database mount clone database';

[...]

Removing automatic instance
Automatic instance removed
auxiliary instance file /u02/app/oracle/oradata/MPMX5/MPMX5/datafile/MPMX5/datafile/o1_mf_temp_kc5knl39_.tmp deleted
auxiliary instance file /u02/app/oracle/oradata/MPMX5/MPMX5/datafile/MPMX5/onlinelog/o1_mf_4_kc5kn4z4_.log deleted
auxiliary instance file /u02/app/oracle/oradata/MPMX5/MPMX5/datafile/MPMX5/onlinelog/o1_mf_3_kc5kn4x5_.log deleted
auxiliary instance file /u02/app/oracle/oradata/MPMX5/MPMX5/datafile/MPMX5/onlinelog/o1_mf_2_kc5kn4vf_.log deleted
auxiliary instance file /u02/app/oracle/oradata/MPMX5/MPMX5/datafile/MPMX5/onlinelog/o1_mf_1_kc5kn4so_.log deleted
auxiliary instance file /u02/app/oracle/oradata/MPMX5/MPMX5/datafile/MPMX5/datafile/o1_mf_sysaux_kc5km3nb_.dbf deleted
auxiliary instance file /u02/app/oracle/oradata/MPMX5/MPMX5/datafile/MPMX5/E203D75664DE289CE0533E24100A3F14/datafile/o1_mf_undo_2_kc5kmdm9_.dbf deleted
auxiliary instance file /u02/app/oracle/oradata/MPMX5/MPMX5/datafile/MPMX5/datafile/o1_mf_undotbs2_kc5km3w5_.dbf deleted
auxiliary instance file /u02/app/oracle/oradata/MPMX5/MPMX5/datafile/MPMX5/E203D75664DE289CE0533E24100A3F14/datafile/o1_mf_undotbs1_kc5kmd8w_.dbf deleted
auxiliary instance file /u02/app/oracle/oradata/MPMX5/MPMX5/datafile/MPMX5/datafile/o1_mf_undotbs1_kc5km3v2_.dbf deleted
auxiliary instance file /u02/app/oracle/oradata/MPMX5/MPMX5/datafile/MPMX5/E203D75664DE289CE0533E24100A3F14/datafile/o1_mf_system_kc5kmcdv_.dbf deleted
auxiliary instance file /u02/app/oracle/oradata/MPMX5/MPMX5/datafile/MPMX5/datafile/o1_mf_system_kc5km3sg_.dbf deleted
auxiliary instance file /u02/app/oracle/oradata/MPMX5/MPMX5/datafile/MPMX5/controlfile/o1_mf_kc5klpq0_.ctl deleted
auxiliary instance file tspitr_pjtg_29120.dmp deleted
Finished recover at 2022-06-22 09:37:29

As we can see, the recovery went smooth this time. But we also see, that RMAN still uses the SGA_TARGET from the target database but now simply adds my custom parameter file with the IFILE option. The parameters from the IFILE then takes precedemce over the default values. There’s just one small thing in the docs, the diagram correctly says “AUXILIARY INSTANCE PARMETER FILE TO” whilst the description states “AUXILIARY INSTANCE PARMETERFILE TO” which is wrong. But now let’s crosscheck the memory usage during the recovery.

[odaadmin@odax5-base-00 ~]$ ps -ef |grep smon
root     10888     1  2 Jun10 ?        07:28:06 /u01/app/19.15.0.0/grid/bin/osysmond.bin
grid     13237     1  0 Jun10 ?        00:00:12 asm_smon_+ASM1
oracle   29865     1  0 Jun10 ?        00:00:16 ora_smon_mwe191
odaadmin 29930     1  0 Jun10 ?        00:00:15 ora_smon_sirac191
odaadmin 82423     1  0 08:56 ?        00:00:00 ora_smon_MPM1
odaadmin 86915     1  0 09:32 ?        00:00:00 ora_smon_FB2T1
odaadmin 92942     1  0 09:35 ?        00:00:00 ora_smon_pjtg
root     93881  7723  0 09:36 pts/0    00:00:00 grep --color=auto smon

[odaadmin@odax5-base-00 ~]$ export ORACLE_HOME=/u01/app/odaorabase/odaadmin
[odaadmin@odax5-base-00 ~]$ export ORACLE_SID=pjtg
[odaadmin@odax5-base-00 ~]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Wed Jun 22 09:36:24 2022
Version 19.14.0.0.0

Copyright (c) 1982, 2021, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.14.0.0.0

SQL> show parameter sga_ta

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
sga_target                           big integer 4G

That’s just to prove, the auxiliary instance really did pick up the customized parameters. Ok, that’s all for today, folks 🙂

Rolling forward a Physical Standby

Today I am going to write about one of my favorite new features for Oracle Data Guard which was introduced with version 12.1. For this post I refer to primary and phyiscal standby database simply as primary and standby for the sake of readability. So normally a standby will receive the redo stream in some kind from the primary and immediatly applies the redo to it’s datafiles. That way the standby is always up-to-date as good as it can. But things happen, the standby might be out of sync with the primary. Either due to NOLOGGING operations which can be prevented by enabling the FORCE LOGGING mode or because the redo stream has gaps for some reason. In such a case a reinstate operation of the standby is required which is basically doing a complete new copy of the primary. This can be time consuming and resource intensive depending on the size of the database.

So what can be done to shorten this procedure? The answer is simple, create and apply an incremental backup. The idea behind that is this, we know the SCN at which the standby stopped applying redo. This SCN can be used as a starting point for an incremental backup. This incremental backup will then contain all the blocks that have a SCN higher than the one of the standby. In the next step this backup can be applied to the standby refreshing all the data blocks that have changed meanwhile. Maybe a controlfile refresh is necessary too in case some datafiles where added. This method was a manual process in 11.2.

Starting with 12.1 the whole thing became much more easier because the “restore/recover from service” option for RMAN was introduced. Let me walk through a simple example. In the example the standby was disabled in the Data Guard configuration. When re-enabling it, a gap in the redo stream was detected.

DGMGRL> show configuration

Configuration - mes19

  Protection Mode: MaxPerformance
  Members:
  mes19dc1 - Primary database
    mes19dc2 - Physical standby database (disabled)
      ORA-16749: The member was disabled manually.

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 42 seconds ago)

DGMGRL> enable database mes19dc2;
Enabled.
DGMGRL>  show configuration

Configuration - mes19

  Protection Mode: MaxPerformance
  Members:
  mes19dc1 - Primary database
    Error: ORA-16724: cannot resolve gap for one or more members

    mes19dc2 - Physical standby database
      Error: ORA-16810: multiple errors or warnings detected for the member

Fast-Start Failover:  Disabled

Configuration Status:
ERROR   (status updated 8 seconds ago)

Since this were RAC databases, I needed to stop the whole database and start with a single instance to perform the recovery.

$ srvctl stop database -db mes19dc2


SQL> startup nomount

Now the fun begins. At first I refreshed to controlfile using the new syntax. The only requirement is, that your SYS passwords are the same for target and source. But for a Data Guard setup this should be not an issue.

$ rman target /

Recovery Manager: Release 19.0.0.0.0 - Production on Tue Feb 18 15:44:42 2020
Version 19.5.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: MES19 (not mounted)

RMAN> restore standby controlfile from service 'rboda0101x:1521/mes19dc1';

Starting restore at 18-FEB-20
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=206 instance=mes191 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: using network backup set from service rboda0101x:1521/mes19dc1.rb.de.conti.de
channel ORA_DISK_1: restoring control file
channel ORA_DISK_1: restore complete, elapsed time: 00:00:02
output file name=+REDO/MES19DC2/CONTROLFILE/current.279.1026811235
Finished restore at 18-FEB-20

Now I was able to mount the standby again. But since the controlfile is a restore from primary, all the paths to datafiles are now the ones that came from the primary. The solution is simply cataloging everything that was part of standby and switch datafiles.

RMAN> alter database mount;

released channel: ORA_DISK_1
Statement processed

RMAN> catalog start with '+DATA';

Starting implicit crosscheck backup at 18-FEB-20
allocated channel: ORA_DISK_1
allocated channel: ORA_DISK_2
allocated channel: ORA_DISK_3

...

RMAN> switch database to copy;

At this point I can start recovering the standby using the primary as the source for an incremental backup. The whole procedure, that I described earlier, is done automatically by RMAN using one single command.

RMAN> recover database from service 'rboda0101x:1521/mes19dc1';

This typically works straight away. Unfortunately I was using version 19.5 which obviously has a bug. It creates huge traces. And I mean really huge. These traces filled up my filesystem hosting the ADR. The traces contained something like this.

-- key in target file:
kcbtse_print_kod: printing kcbtekod structure 0x7fc77f1ae310
--kcbtekod_ena=0 kcbtekod_flag=0x0 kcbtekod_mkloc=0x0
--kcbtekod_key=0000000000000000000000000000000000000000000000000000000000000000
--kcbtekod_mkeyid=00000000000000000000000000000000
-- key in backup set:
kcbtse_print_kod: printing kcbtekod structure 0x7fc77fd61cf8
--kcbtekod_ena=2 kcbtekod_flag=0x6 kcbtekod_mkloc=0x0
--kcbtekod_key=ecc8c60f4de9fc8731ef019f080b17db00000000000000000000000000000000
--kcbtekod_mkeyid=9c5ac94c63844f08bf8cd818eb980ad7
-- key in target file:
kcbtse_print_kod: printing kcbtekod structure 0x7fc77f1ae310
--kcbtekod_ena=0 kcbtekod_flag=0x0 kcbtekod_mkloc=0x0
--kcbtekod_key=0000000000000000000000000000000000000000000000000000000000000000
--kcbtekod_mkeyid=00000000000000000000000000000000
-- key in backup set:
kcbtse_print_kod: printing kcbtekod structure 0x7fc77fd61cf8
--kcbtekod_ena=2 kcbtekod_flag=0x6 kcbtekod_mkloc=0x0
--kcbtekod_key=ecc8c60f4de9fc8731ef019f080b17db00000000000000000000000000000000
--kcbtekod_mkeyid=9c5ac94c63844f08bf8cd818eb980ad7

These lines were repeated over and over again. My simple, quick and dirty solution was to get rid of these traces once in a while.

mes191:root->oracle@rboda0201x:/u01/app/oracle/diag/rdbms/mes19dc2/mes191/trace
$ while true; do for i in $(ls mes191*trc); do echo ""> $i; done; sleep 60; done

Having that in place, my recover command in RMAN completed successfully.

channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: using network backup set from service rboda0101x:1521/mes19dc1
destination for restore of datafile 00025: +DATA/MES19DC2/996EF5BF4924EC53E05328DED70AA5CE/DATAFILE/mes_data.314.1029229915
channel ORA_DISK_2: starting incremental datafile backup set restore
channel ORA_DISK_2: using network backup set from service rboda0101x:1521/mes19dc1
destination for restore of datafile 00026: +DATA/MES19DC2/996EF5BF4924EC53E05328DED70AA5CE/DATAFILE/mes_index.313.1029229915
channel ORA_DISK_2: restore complete, elapsed time: 01:01:25
channel ORA_DISK_1: restore complete, elapsed time: 01:22:26

starting media recovery

archived log for thread 1 with sequence 610 is already on disk as file +RECO/MES19DC2/ARCHIVELOG/2020_02_18/thread_1_seq_610.260.1032709543
archived log for thread 1 with sequence 611 is already on disk as file +RECO/MES19DC2/ARCHIVELOG/2020_02_18/thread_1_seq_611.381.1032713241
archived log for thread 1 with sequence 612 is already on disk as file +RECO/MES19DC2/ARCHIVELOG/2020_02_18/thread_1_seq_612.261.1032713243
archived log for thread 2 with sequence 741 is already on disk as file +RECO/MES19DC2/ARCHIVELOG/2020_02_18/thread_2_seq_741.380.1032709545
archived log for thread 2 with sequence 742 is already on disk as file +RECO/MES19DC2/ARCHIVELOG/2020_02_18/thread_2_seq_742.383.1032713241
archived log for thread 2 with sequence 743 is already on disk as file +RECO/MES19DC2/ARCHIVELOG/2020_02_18/thread_2_seq_743.257.1032713245
archived log file name=+RECO/MES19DC2/ARCHIVELOG/2020_02_18/thread_1_seq_610.260.1032709543 thread=1 sequence=610
archived log file name=+RECO/MES19DC2/ARCHIVELOG/2020_02_18/thread_2_seq_741.380.1032709545 thread=2 sequence=741
archived log file name=+RECO/MES19DC2/ARCHIVELOG/2020_02_18/thread_1_seq_611.381.1032713241 thread=1 sequence=611
archived log file name=+RECO/MES19DC2/ARCHIVELOG/2020_02_18/thread_2_seq_742.383.1032713241 thread=2 sequence=742
archived log file name=+RECO/MES19DC2/ARCHIVELOG/2020_02_18/thread_1_seq_612.261.1032713243 thread=1 sequence=612
archived log file name=+RECO/MES19DC2/ARCHIVELOG/2020_02_18/thread_2_seq_743.257.1032713245 thread=2 sequence=743
media recovery complete, elapsed time: 00:00:01
Finished recover at 18-FEB-20

There is one more thing to do at this point. All the Online and Standby Redo Logs inherited their names from the primary too. So I had to reset them to something that fits the standby. For non-OMF you would do an “alter database rename file”, for OMF I did it like this.

SQL> r
  1* select distinct  'alter database clear logfile group ' || group# || ';' from v$logfile

'ALTERDATABASECLEARLOGFILEGROUP'||GROUP#||';'
----------------------------------------------------------------------------
alter database clear logfile group 1;
alter database clear logfile group 21;
alter database clear logfile group 22;
alter database clear logfile group 23;
alter database clear logfile group 24;
alter database clear logfile group 25;
alter database clear logfile group 26;
alter database clear logfile group 27;
alter database clear logfile group 28;
alter database clear logfile group 29;
alter database clear logfile group 2;
alter database clear logfile group 30;
alter database clear logfile group 3;
alter database clear logfile group 4;
alter database clear logfile group 5;
alter database clear logfile group 6;
alter database clear logfile group 7;
alter database clear logfile group 8;

18 rows selected.

And then of cause execute all these commands. Finally I just enabled the Data Guard configuration again and checked the state.

DGMGRL> show configuration

Configuration - mes19

  Protection Mode: MaxPerformance
  Members:
  mes19dc1 - Primary database
    mes19dc2 - Physical standby database (disabled)
      ORA-16749: The member was disabled manually.

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 22 seconds ago)

DGMGRL> enable database mes19dc2;
Enabled.

DGMGRL> show configuration

Configuration - mes19

  Protection Mode: MaxPerformance
  Members:
  mes19dc1 - Primary database
    mes19dc2 - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 30 seconds ago)

DGMGRL> show database mes19dc2

Database - mes19dc2

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 1.26 MByte/s
  Real Time Query:    OFF
  Instance(s):
    mes191 (apply instance)
    mes192

Database Status:
SUCCESS

You may wait up to a minute to get the current status reflect in the output of DGMGRL, so be patient.

Another thing which I did not show in this is example is the handling of new datafiles. For that you would just issue a “restore from service” before doing the actual recovery.

I really do like this new (or in the meantime not so new anymore) feature and I hope it’ll help you (and me) to speed up resynchronisation of your standby in the future.

More information about this can be found in the Docs or MOS Note 1987763.1.

Size of Flashback Logs

The todays blog post is about the Flashback Database feature inside the Oracle Database. I use this feature mostly in the context of Data Guard, but there are also other possible use cases like having a fallback during updates/upgrades etc. For Data Guard setups, the Flashback Database feature is simplyfies and speeds up a reinstantiation of a formerly primary database that got lost and a failover the standby database was performed. With Flashback Database there is no need to perform a complete duplication of the database to bring it up and running as a standby database. The database is simply rolled back to just before the SCN when the standby database was activated by the failover. It can then just start applying redo that it gets from the new primary database.
But how does it work? When activating this feature, the database writes some additional logs that cotain information to rollback the database. These logs are stored in the Fast Recovery Area and there is no way to influence that. Let’s have a look at that.

SQL> select group#, status, bytes from v$log;

    GROUP# STATUS                BYTES
---------- ---------------- ----------
         1 CURRENT          1073741824
         2 INACTIVE         1073741824
         3 INACTIVE         1073741824

SQL> alter database flashback on;

Database altered.

SQL> select * from v$flashback_database_logfile;

NAME                                                                                   LOG#    THREAD#  SEQUENCE#      BYTES FIRST_CHANGE# FIRST_TIME          TYPE          CON_ID
-------------------------------------------------------------------------------- ---------- ---------- ---------- ---------- ------------- ------------------- --------- ----------
/u03/app/oracle/fast_recovery_area/MMISI2/flashback/o1_mf_h1xp7lv0_.flb                   1          1          1 1073741824       3886676 2020-01-15 10:18:44 NORMAL             0
/u03/app/oracle/fast_recovery_area/MMISI2/flashback/o1_mf_h1xp7pnk_.flb                   2          1          1 1073741824             0                     RESERVED           0

SQL> !ls -l /u03/app/oracle/fast_recovery_area/MMISI2/flashback/
total 2097168
-rw-r----- 1 oracle dba 1073750016 Jan 15 10:22 o1_mf_h1xp7lv0_.flb
-rw-r----- 1 oracle dba 1073750016 Jan 15 10:18 o1_mf_h1xp7pnk_.flb

Obviously the logs are the same size as the redologs. And they consume space inside your FRA. But what if one does not have sufficient space and/or wants to shrink those flashback logs? Since the size is being inherited from the size of the redologs, let’s try shrinking the redologs by creating new groups and dropping the old ones.

SQL> alter database add logfile group 10 size 256m;

Database altered.

SQL> alter database add logfile group 11 size 256m;

Database altered.

SQL> alter database add logfile group 12 size 256m;

Database altered.

SQL> alter database drop logfile group 2;

Database altered.

SQL> alter database drop logfile group 3;

Database altered.

SQL> alter system switch logfile;

System altered.

SQL> alter system checkpoint;

System altered.

SQL> alter database drop logfile group 1;

Database altered.

SQL> select group#, bytes, status from v$log;

    GROUP#      BYTES STATUS
---------- ---------- ----------------
        10  268435456 CURRENT
        11  268435456 UNUSED
        12  268435456 UNUSED

Now let’s check how the flashback logs look like:

SQL> select log#, thread#, sequence#, bytes, type from v$flashback_database_logfile;

      LOG#    THREAD#  SEQUENCE#      BYTES TYPE
---------- ---------- ---------- ---------- ---------
         1          1          1 1073741824 NORMAL
         2          1          1 1073741824 RESERVED

Still unchanged. May it requires some work to be done to cycle through the logs.

SQL> create table dings tablespace users as select * from dba_source;

Table created.

SQL> insert into dings select * from dings;

283698 rows created.

SQL> insert into dings select * from dings;

567396 rows created.

SQL> insert into dings select * from dings;

1134792 rows created.

SQL> insert into dings select * from dings;

2269584 rows created.

SQL> insert into dings select * from dings;

4539168 rows created.

SQL> commit;

Commit complete.

SQL> select group#, sequence#, bytes, status from v$log;

    GROUP#  SEQUENCE#      BYTES STATUS
---------- ---------- ---------- ----------------
        10         11  268435456 ACTIVE
        11         12  268435456 ACTIVE
        12         13  268435456 CURRENT

SQL> select log#, thread#, sequence#, bytes, type from v$flashback_database_logfile;

      LOG#    THREAD#  SEQUENCE#      BYTES TYPE
---------- ---------- ---------- ---------- ---------
         1          1          1 1073741824 NORMAL
         2          1          1 1073741824 RESERVED

Ok, no change here. There isn’t even a new flashback log. Maybe it wasn’t enough change? So let’s try some row-by-row deletion. And please do not do this in production, deleting rows can’t be more inefficient as that.

SQL> begin
   2   for rec in (select rowid from dings) loop
   3     delete from dings where rowid = rec.rowid;
   4     commit;
   5   end loop;
   6 end;
   7 /

PL/SQL procedure successfully completed.

SQL> select group#, sequence#, bytes, status from v$log;

    GROUP#  SEQUENCE#      BYTES STATUS
---------- ---------- ---------- ----------------
        10         53  268435456 CURRENT
        11         51  268435456 INACTIVE
        12         52  268435456 INACTIVE

SQL> select log#, thread#, sequence#, bytes, type from v$flashback_database_logfile;

      LOG#    THREAD#  SEQUENCE#      BYTES TYPE
---------- ---------- ---------- ---------- ---------
         1          1          1 1073741824 NORMAL
         2          1          1 1073741824 RESERVED

At lease this produced quite some redologs, but no new flashback logs. Maybe the “db_flashback_retention_target” parameter has an impact on these logs?

SQL> show parameter flashb

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_flashback_retention_target        integer     1440

SQL> alter system set db_flashback_retention_target=10;

System altered.

SQL> select log#, thread#, sequence#, bytes, type from v$flashback_database_logfile;

      LOG#    THREAD#  SEQUENCE#      BYTES TYPE
---------- ---------- ---------- ---------- ---------
         1          1          1 1073741824 NORMAL
         2          1          1 1073741824 RESERVED

Simply speaking, no, it hasn’t. Maybe other operations influence the generation of flashback logs, like index maintenance?

SQL> insert into dings select * from dba_source;

283698 rows created.

SQL> commit;

Commit complete.

SQL> create index dings_ix1 on dings(owner, name, line) tablespace new;

Index created.

SQL> insert into dings select * from dings;

283698 rows created.

SQL> insert into dings select * from dings;

567396 rows created.

SQL> insert into dings select * from dings;

1134792 rows created.

SQL> insert into dings select * from dings;

2269584 rows created.

SQL> commit;

Commit complete.

SQL> select group#, sequence#, bytes, status from v$log;

    GROUP#  SEQUENCE#      BYTES STATUS
---------- ---------- ---------- ----------------
        10         65  268435456 ACTIVE
        11         66  268435456 ACTIVE
        12         67  268435456 CURRENT

SQL> select log#, thread#, sequence#, bytes, type, first_time from v$flashback_database_logfile;

      LOG#    THREAD#  SEQUENCE#      BYTES TYPE      FIRST_TIME
---------- ---------- ---------- ---------- --------- -------------------
         3          1          3 1073741824 NORMAL    2020-01-15 11:27:47
         4          1          1 1073741824 RESERVED
         1          1          1 1073741824 NORMAL    2020-01-15 10:18:44
         2          1          2 1073741824 NORMAL    2020-01-15 11:26:36

SQL> !ls -l /u03/app/oracle/fast_recovery_area/MMISI2/flashback/
total 4194336
-rw-r----- 1 oracle dba 1073750016 Jan 15 11:26 o1_mf_h1xp7lv0_.flb
-rw-r----- 1 oracle dba 1073750016 Jan 15 11:27 o1_mf_h1xp7pnk_.flb
-rw-r----- 1 oracle dba 1073750016 Jan 15 11:28 o1_mf_h1xt6z4y_.flb
-rw-r----- 1 oracle dba 1073750016 Jan 15 11:27 o1_mf_h1xt95vj_.flb

Ok, finally there are some new flashback logs. But the size of those logs is still at 1GB.
I am running out of ideas, the last chance is to disable and enable the flashback datbae feautre.

SQL> alter database flashback off;

Database altered.

SQL> alter database flashback on;

Database altered.

SQL> select log#, thread#, sequence#, bytes, type, first_time from v$flashback_database_logfile;

      LOG#    THREAD#  SEQUENCE#      BYTES TYPE      FIRST_TIME
---------- ---------- ---------- ---------- --------- -------------------
         1          1          1  268435456 NORMAL    2020-01-15 11:30:01
         2          1          1  268435456 RESERVED

SQL> !ls -l /u03/app/oracle/fast_recovery_area/MMISI2/flashback/
total 524304
-rw-r----- 1 oracle dba 268443648 Jan 15 11:30 o1_mf_h1xtf8yz_.flb
-rw-r----- 1 oracle dba 268443648 Jan 15 11:30 o1_mf_h1xtfck6_.flb

Here we go, the flashback logs are finally shrinked. But with the consequences of loosing all the flashback information that was previously there. Keep that in mind and schedule wisely when changing the size of your flashback logs.

Platform change and upgrade in one step – XTTS

Today I want to tell you something about cross platform transportable tablespaces (XTTS) and incrementally updated datafile copies. This is a very nice feature to move to another platform and change the Oracle version in one step. There are several MOS notes, whitepapers and of cause some Upgrade-Slides by Mike Dietrich covering this topic.

The basic steps in all cases and versions are these:

  1. set tablespace(s) read only
  2. export metadata for those tablespaces
  3. transport datafiles and dump to target system
  4. create necessary database users
  5. import dump into target database which makes the datafiles part of the target database
  6. set tablespace(s) read write
  7. optionally, transport other objects like views, PL/SQL etc.

But what if the tablespaces are too big and copy time would exceed the allowed downtime window? Then we can use incremental backups to shorten the downtime. The steps to do this, are quite similar.

  1. backup tablespace(s) for transport
  2. copy backup to target system
  3. restore and convert datafiles to target destination and platform
  4. create incremental backup for transported tablespace(s)
  5. recover transported datafiles using the incremental backup
  6. repeat steps 4 and 5 until final downtime window
  7. set tablespace(s) read only
  8. create final incremental backup for transported tablespace(s)
  9. apply incremental backup to transported datafiles
  10. continue with step 2 of the initial process above

Sounds a bit complicated, doesn’t it? So Oracle was so kind to put this whole process into scripts. The following MOS notes contain and describe these scripts:

  • 11G – Reduce Transportable Tablespace Downtime using Cross Platform Incremental Backup (Doc ID 1389592.1)
  • 12C – Reduce Transportable Tablespace Downtime using Cross Platform Incremental Backup (Doc ID 2005729.1)
  • V4 Reduce Transportable Tablespace Downtime using Cross Platform Incremental Backup (Doc ID 2471245.1)

During a migration project from Linux to SPARC (don’t ask) I wanted to use those scripts. But I experienced the following during “Phase 2”:

oracle@server2:~/xtts> $ORACLE_HOME/perl/bin/perl xttdriver.pl --backup
============================================================
trace file is /oracle/xtts/backup_Sep20_Thu_12_20_52_917//Sep20_Thu_12_20_52_917_.log
=============================================================

[…]

--------------------------------------------------------------------
Done with prepare phase
--------------------------------------------------------------------


--------------------------------------------------------------------
Find list of datafiles in system
--------------------------------------------------------------------

SP2-0027: Input is too long (> 2499 characters) - line ignored
SP2-0027: Input is too long (> 2499 characters) - line ignored
           from dual
           *
ERROR at line 19:
ORA-00936: missing expression

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Error:
------
Error in executing /oracle/xtts/backup_Sep20_Thu_12_27_07_889//diff.sql

Okay. The reason is, I have loads of tablespaces that need to be transferred. This list messed up the whole process. So I tried to understand the process and just do it myself. What I learned is, that it is quite easy to do it manually. And I share my findings with you by simply outlining the basic process in an upcoming blog post.

Recyclebin vs. Autoextend

At the end of last year I did a presentation at the Nuremberg DOAG Regional Meetup. One of my talks was about Flashback and Un-drop a table. The question came up wether a tablespace will autoextend or overwrite objects in the recyclebin in case of space pressure. I did not have an answer to that so I investigated this. And here’s the result.

First I created an user and a tablespace and granted some privileges to the user:

[oracle@oel6u4 ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Fri Jan 5 12:02:44 2018

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SYS@perf> create tablespace test datafile size 10m autoextend on next 1m;

Tablespace created.

SYS@perf> create user test identified by test default tablespace test quota unlimited on test;

User created.

SYS@perf> grant create session to test;

Grant succeeded.

SYS@perf> grant create table to test;

Grant succeeded.

SYS@perf> grant select any dictionary to test;

Grant succeeded.

This user now can be used to test the behaviour. Basically I created a table that does not fit in the tablespace first.

SYS@perf> conn test/test
Connected.
TEST@perf> show parameter recyclebin

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
recyclebin                           string      on
TEST@perf> create table test as select * from all_source;

Table created.

The alert.log shows some file resize operations during this table creation:

2018-01-05 12:03:09.686000 +01:00
create tablespace test datafile size 10m autoextend on next 1m
Completed: create tablespace test datafile size 10m autoextend on next 1m
2018-01-05 12:05:08.112000 +01:00
Resize operation completed for file# 6, old size 10240K, new size 11264K
Resize operation completed for file# 6, old size 11264K, new size 12288K
2018-01-05 12:05:09.325000 +01:00
Resize operation completed for file# 6, old size 12288K, new size 13312K

Let’s check the current space usage from the database perspective:

TEST@perf> select sum(bytes),segment_name from user_segments group by segment_name;

SUM(BYTES) SEGMENT_NAME
---------- ----------------------------------------
  12582912 TEST

TEST@perf> select sum(bytes) from dba_free_space where tablespace_name='TEST';

SUM(BYTES)
----------
   1048576

TEST@perf> select sum(bytes) from dba_data_files where tablespace_name='TEST';

SUM(BYTES)
----------
  13631488

So I dropped the table and checked the situation again:

TEST@perf> drop table test;

Table dropped.

TEST@perf> show recyclebin
ORIGINAL NAME    RECYCLEBIN NAME                OBJECT TYPE  DROP TIME
---------------- ------------------------------ ------------ -------------------
TEST             BIN$YgZ7U68dFi7gU244qMDQ2Q==$0 TABLE        2018-01-05:12:09:26

TEST@perf> select sum(bytes),segment_name from user_segments group by segment_name;

SUM(BYTES) SEGMENT_NAME
---------- ----------------------------------------
  12582912 BIN$YgZ7U68dFi7gU244qMDQ2Q==$0

TEST@perf> select sum(bytes) from dba_free_space where tablespace_name='TEST';

SUM(BYTES)
----------
  13631488

Ok, the dropped segment is still there, but we have the whole tablespace as free space available.
Now the funny part, what will happen if I create another table?

TEST@perf> create table test as select * from all_source;

Table created.

TEST@perf> select sum(bytes),segment_name from user_segments group by segment_name;

SUM(BYTES) SEGMENT_NAME
---------- ----------------------------------------
  12582912 TEST

TEST@perf> show recyclebin
TEST@perf>

That means, Oracle prefers to overwrite the dropped segment instead of increasing the tablespace size. It makes sense in my opinion, the “flashback to before drop” feature is meant to help in case of emergency when something really bad happened, not the keep old things for as long as possible.
So if you want to keep your objects, just do not drop them 🙂

RMAN delete obsolete but keep archivelogs

When doing RMAN backups we typically have one strategy and one policy. But the deletion policy always considers not only backups and copies but also archivelogs. In some cases one might want to keep archivelogs for a longer period of time, for instance if you are running a non-Dataguard Standby database using DBVisit. Let’s say we want to keep two generations of backups, we would do the following:

RMAN> CONFIGURE RETENTION POLICY TO REDUNDANCY 2;

new RMAN configuration parameters:
CONFIGURE RETENTION POLICY TO REDUNDANCY 2;
new RMAN configuration parameters are successfully stored

Now let’s see what backups we have:

RMAN> list backup of database summary;


List of Backups
===============
Key     TY LV S Device Type Completion Time     #Pieces #Copies Compressed Tag
------- -- -- - ----------- ------------------- ------- ------- ---------- ---
23      B  F  A DISK        22.12.2016 09:19:14 1       1       NO         TAG20161222T091750
27      B  F  A DISK        22.12.2016 10:02:33 1       1       NO         TAG20161222T100145
28      B  F  A DISK        22.12.2016 10:02:57 1       1       NO         TAG20161222T100145
29      B  F  A DISK        22.12.2016 10:03:22 1       1       NO         TAG20161222T100145
33      B  F  A DISK        22.12.2016 10:17:14 1       1       NO         TAG20161222T101632
34      B  F  A DISK        22.12.2016 10:17:32 1       1       NO         TAG20161222T101632
35      B  F  A DISK        22.12.2016 10:18:00 1       1       NO         TAG20161222T101632


RMAN> list backup of archivelog all summary;


List of Backups
===============
Key     TY LV S Device Type Completion Time     #Pieces #Copies Compressed Tag
------- -- -- - ----------- ------------------- ------- ------- ---------- ---
24      B  A  A DISK        22.12.2016 09:19:30 1       1       NO         TAG20161222T091930
26      B  A  A DISK        22.12.2016 10:01:34 1       1       NO         TAG20161222T100127
30      B  A  A DISK        22.12.2016 10:03:35 1       1       NO         TAG20161222T100335
32      B  A  A DISK        22.12.2016 10:16:24 1       1       NO         TAG20161222T101615
36      B  A  A DISK        22.12.2016 10:18:16 1       1       NO         TAG20161222T101815

So we have two database backups and corresponding backups of archivelogs. And what archivelogs are still on disk?

RMAN> list archivelog all;

List of Archived Log Copies for database with db_unique_name OLTP
=====================================================================

Key     Thrd Seq     S Low Time
------- ---- ------- - -------------------
46      1    15      A 21.12.2016 12:56:40
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_15_d5q709b5_.arc

41      1    16      A 22.12.2016 09:17:31
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_16_d5q709x1_.arc

43      1    17      A 22.12.2016 09:19:29
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_17_d5q709oq_.arc

35      1    18      A 22.12.2016 10:01:26
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_18_d5q5q65p_.arc

36      1    19      A 22.12.2016 10:03:33
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_19_d5q6gtmj_.arc

38      1    20      A 22.12.2016 10:16:10
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_20_d5q6lm3q_.arc

45      2    11      A 21.12.2016 12:56:30
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_11_d5q709gs_.arc

42      2    12      A 22.12.2016 09:17:20
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_12_d5q709xo_.arc

44      2    13      A 22.12.2016 09:19:14
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_13_d5q709ln_.arc

40      2    14      A 22.12.2016 10:01:12
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_14_d5q709pq_.arc

37      2    15      A 22.12.2016 10:03:18
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_15_d5q6gg9k_.arc

39      2    16      A 22.12.2016 10:15:58
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_16_d5q6l6jm_.arc

That’s all archivelogs that would be needed for recovery of the older database backup. Now let’s see what is obsolete:

RMAN> report obsolete;

RMAN retention policy will be applied to the command
RMAN retention policy is set to redundancy 2
Report of obsolete backups and copies
Type                 Key    Completion Time    Filename/Handle
-------------------- ------ ------------------ --------------------
Backup Set           23     22.12.2016 09:19:16
  Backup Piece       23     22.12.2016 09:19:16 /u01/app/oracle/fra/OLTP/43C6AA13C2390666E0538D24100A09EF/backupset/2016_12_22/o1_mf_nnndf_TAG20161222T091750_d5q33orv_.bkp
Backup Set           24     22.12.2016 09:19:30
  Backup Piece       24     22.12.2016 09:19:30 /u01/app/oracle/fra/OLTP/backupset/2016_12_22/o1_mf_annnn_TAG20161222T091930_d5q34ljw_.bkp
Backup Set           25     22.12.2016 09:19:34
  Backup Piece       25     22.12.2016 09:19:34 /u01/app/oracle/fra/OLTP/autobackup/2016_12_22/o1_mf_s_931252772_d5q34ol5_.bkp
Backup Set           26     22.12.2016 10:01:36
  Backup Piece       26     22.12.2016 10:01:36 /u01/app/oracle/fra/OLTP/backupset/2016_12_22/o1_mf_annnn_TAG20161222T100127_d5q5m88s_.bkp
Archive Log          43     22.12.2016 10:25:30 /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_17_d5q709oq_.arc
Archive Log          41     22.12.2016 10:25:30 /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_16_d5q709x1_.arc
Archive Log          42     22.12.2016 10:25:30 /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_12_d5q709xo_.arc
Archive Log          44     22.12.2016 10:25:31 /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_13_d5q709ln_.arc
Archive Log          45     22.12.2016 10:25:35 /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_11_d5q709gs_.arc
Archive Log          46     22.12.2016 10:25:37 /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_15_d5q709b5_.arc

The oldest backup is obsolete which is fine since we already have three full backups. Also all the archivelogs are obsolete since they were already backuped up and can be restored from those backups in case of emergency. But what can we do if we want to keep the archivelogs on disk as long as possible? A “delete obsolete” would remove them along with the outdated database backup.
Let’s try to keep them using “change”:

RMAN> change archivelog all keep until time 'sysdate+3';

released channel: ORA_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=279 instance=oltp_1 device type=DISK
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of KEEP command at 12/22/2016 10:29:34
RMAN-06529: CHANGE ... KEEP not supported for ARCHIVELOG

So that is not an option. Maybe changing them to “unavailable” might do the job?

RMAN> change archivelog all unavailable;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of unavailable command on ORA_DISK_1 channel at 12/22/2016 10:30:43
ORA-19813: cannot have unavailable file /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_15_d5q709b5_.arc in DB_RECOVERY_FILE_DEST

It is not possible to mark archivelog files in FRA as unavailable. There must be another way to excempt the archivelogs from the retention policy. The policy applies to all files known to the database. So let’s get rid of the archivelogs:

RMAN> change archivelog all uncatalog;

uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_15_d5q709b5_.arc RECID=46 STAMP=931256737
uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_16_d5q709x1_.arc RECID=41 STAMP=931256730
uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_17_d5q709oq_.arc RECID=43 STAMP=931256730
uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_18_d5q5q65p_.arc RECID=35 STAMP=931255414
uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_19_d5q6gtmj_.arc RECID=36 STAMP=931256170
uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_20_d5q6lm3q_.arc RECID=38 STAMP=931256291
uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_11_d5q709gs_.arc RECID=45 STAMP=931256735
uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_12_d5q709xo_.arc RECID=42 STAMP=931256730
uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_13_d5q709ln_.arc RECID=44 STAMP=931256731
uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_14_d5q709pq_.arc RECID=40 STAMP=931256729
uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_15_d5q6gg9k_.arc RECID=37 STAMP=931256158
uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_16_d5q6l6jm_.arc RECID=39 STAMP=931256278
Uncataloged 12 objects

That worked. The files are still there but the database does not know that anymore. Now we can apply the policy in the way we want it:

RMAN> report obsolete;

RMAN retention policy will be applied to the command
RMAN retention policy is set to redundancy 2
Report of obsolete backups and copies
Type                 Key    Completion Time    Filename/Handle
-------------------- ------ ------------------ --------------------
Backup Set           23     22.12.2016 09:19:16
  Backup Piece       23     22.12.2016 09:19:16 /u01/app/oracle/fra/OLTP/43C6AA13C2390666E0538D24100A09EF/backupset/2016_12_22/o1_mf_nnndf_TAG20161222T091750_d5q33orv_.bkp
Backup Set           24     22.12.2016 09:19:30
  Backup Piece       24     22.12.2016 09:19:30 /u01/app/oracle/fra/OLTP/backupset/2016_12_22/o1_mf_annnn_TAG20161222T091930_d5q34ljw_.bkp
Backup Set           25     22.12.2016 09:19:34
  Backup Piece       25     22.12.2016 09:19:34 /u01/app/oracle/fra/OLTP/autobackup/2016_12_22/o1_mf_s_931252772_d5q34ol5_.bkp
Backup Set           26     22.12.2016 10:01:36
  Backup Piece       26     22.12.2016 10:01:36 /u01/app/oracle/fra/OLTP/backupset/2016_12_22/o1_mf_annnn_TAG20161222T100127_d5q5m88s_.bkp

The oldest backup is still obsolete, but now archivelogs anymore. That’s want we wannted to achieve. I can now remove the old backup:

RMAN> delete noprompt obsolete;

RMAN retention policy will be applied to the command
RMAN retention policy is set to redundancy 2
using channel ORA_DISK_1
Deleting the following obsolete backups and copies:
Type                 Key    Completion Time    Filename/Handle
-------------------- ------ ------------------ --------------------
Backup Set           23     22.12.2016 09:19:16
  Backup Piece       23     22.12.2016 09:19:16 /u01/app/oracle/fra/OLTP/43C6AA13C2390666E0538D24100A09EF/backupset/2016_12_22/o1_mf_nnndf_TAG20161222T091750_d5q33orv_.bkp
Backup Set           24     22.12.2016 09:19:30
  Backup Piece       24     22.12.2016 09:19:30 /u01/app/oracle/fra/OLTP/backupset/2016_12_22/o1_mf_annnn_TAG20161222T091930_d5q34ljw_.bkp
Backup Set           25     22.12.2016 09:19:34
  Backup Piece       25     22.12.2016 09:19:34 /u01/app/oracle/fra/OLTP/autobackup/2016_12_22/o1_mf_s_931252772_d5q34ol5_.bkp
Backup Set           26     22.12.2016 10:01:36
  Backup Piece       26     22.12.2016 10:01:36 /u01/app/oracle/fra/OLTP/backupset/2016_12_22/o1_mf_annnn_TAG20161222T100127_d5q5m88s_.bkp
deleted backup piece
backup piece handle=/u01/app/oracle/fra/OLTP/43C6AA13C2390666E0538D24100A09EF/backupset/2016_12_22/o1_mf_nnndf_TAG20161222T091750_d5q33orv_.bkp RECID=23 STAMP=931252741
deleted backup piece
backup piece handle=/u01/app/oracle/fra/OLTP/backupset/2016_12_22/o1_mf_annnn_TAG20161222T091930_d5q34ljw_.bkp RECID=24 STAMP=931252770
deleted backup piece
backup piece handle=/u01/app/oracle/fra/OLTP/autobackup/2016_12_22/o1_mf_s_931252772_d5q34ol5_.bkp RECID=25 STAMP=931252773
deleted backup piece
backup piece handle=/u01/app/oracle/fra/OLTP/backupset/2016_12_22/o1_mf_annnn_TAG20161222T100127_d5q5m88s_.bkp RECID=26 STAMP=931255288
Deleted 4 objects

Afterwards I can re-register the archivelogs.

RMAN> catalog recovery area noprompt;

searching for all files in the recovery area

List of Files Unknown to the Database
=====================================
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_20_d5q6lm3q_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_16_d5q6l6jm_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_15_d5q709b5_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_11_d5q709gs_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_13_d5q709ln_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_17_d5q709oq_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_14_d5q709pq_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_18_d5q5q65p_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_19_d5q6gtmj_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_15_d5q6gg9k_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_16_d5q709x1_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_12_d5q709xo_.arc
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_20_d5q6lm3q_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_16_d5q6l6jm_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_15_d5q709b5_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_11_d5q709gs_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_13_d5q709ln_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_17_d5q709oq_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_14_d5q709pq_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_18_d5q5q65p_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_19_d5q6gtmj_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_15_d5q6gg9k_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_16_d5q709x1_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_12_d5q709xo_.arc

The archivelog files are back in the database catalog. Now I can handle them separately and apply any rule I want.

RMAN> delete noprompt archivelog until time 'sysdate-1/12';

released channel: ORA_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=279 instance=oltp_1 device type=DISK
List of Archived Log Copies for database with db_unique_name OLTP
=====================================================================

Key     Thrd Seq     S Low Time
------- ---- ------- - -------------------
49      1    15      A 21.12.2016 12:56:40
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_15_d5q709b5_.arc

50      2    11      A 21.12.2016 12:56:30
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_11_d5q709gs_.arc

deleted archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_15_d5q709b5_.arc RECID=49 STAMP=931257331
deleted archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_11_d5q709gs_.arc RECID=50 STAMP=931257331
Deleted 2 objects

Not very straight forward, but it does the job.

Inspired by a comment on Twitter by Franck Pachot, the best way to keep a history of archivelogs is not to use “ALL” but “FROM”. So the approach the keep a history of three days would be the follwing:

RMAN> change archivelog from time 'sysdate-3' uncatalog;
RMAN> delete noprompt obsolete;
RMAN> catalog recovery area noprompt;

That way all archivelogs from the last three days are uncataloged. All other archivelogs are handled by the “delete obsolete” operation and there is no need to remove archivelogs manually.

How DB_LOST_WRITE_PROTECT works with Data Guard

Preface

Today I will blog again about a Data Guard topic. There are a couple of best practices out there which one should follow. One of these best practises is enabeling block checking and lost write protection. About the latter there are not many information out there. So that’s why I want to outline the concept and importance of this feature. Actually this post is inspired by a talk that I had during DOAG Conference 2016. I had a presentation about best practices in Data Guard and someone from the audience was asking how that lost write protection actually works.
Basically it is there to detect lost writes, as the parameter clearly states. That means, a write request to the disk was commited an the database is happy with that. But the write did not actually happen for whatever reason. So when the block will be read the next time, it is still in old state, any changed, deleted or added values are not included. The block itself is consistent, it is not corrupted. The DBA will not notice it since there is no error. An error will occur only when you restore the tablespace containing the block and then try to apply the redo stream. The recovery will detect a newer SCN in the redo stream which does not match the blocks SCN. That is the point where it gets tricky.

The test environment

My simple test cases run on a VirtualBox VM with OEL 6.7, Oracle Restart 12.1.0.2 and Oracle Database 12.1.0.2. Primary and Standby run on the same host.
DB_NAME: db12c
DB_UNIQUE_NAME: db12ca
DB_UNIQUE_NAME: db12cb
You will see the names in my SQL prompt to make things clear.

This is the current state of the system:

SYS@db12ca> show parameter lost

NAME                                 TYPE                              VALUE
------------------------------------ --------------------------------- ------------------------------
db_lost_write_protect                string                            NONE

SYS@db12ca> select database_role from v$database;

DATABASE_ROLE
------------------------------------------------
PHYSICAL STANDBY
SYS@db12cb> show parameter lost


NAME                                 TYPE                              VALUE
------------------------------------ --------------------------------- ------------------------------
db_lost_write_protect                string                            NONE


SYS@db12cb> select database_role from v$database;

DATABASE_ROLE
------------------------------------------------
PRIMARY

So “db12cb” is my primary and “db12ca” my standby instance. by the way, that’s why I gave them the suffix “a” and “b” because they may change roles over and over again.

For testing I create a separate tablespace with manual space management. This allows me to specify FREELISTS=1. Otherwise the changes to my data may end up in different blocks which is not what I want for my testing. Beside that, I create an user which I will use for testing and which gets the necessary grants.

SYS@db12cb> create tablespace marco datafile size 100m segment space management manual;

Tablespace created.

SYS@db12cb> create user marco identified by marco default tablespace marco quota unlimited on marco;

User created.

SYS@db12cb> grant create session to marco;

Grant succeeded.

SYS@db12cb> grant create table to marco;

Grant succeeded.

Scenario #1: No Lost Write Detection

The new user can now create a table and insert some data, so let’s do that.

SYS@db12cb> conn marco/marco
Connected.
MARCO@db12cb> create table testtable (id number, txt varchar2(100)) storage (freelists 1);

Table created.

MARCO@db12cb> insert into testtable values (1, 'Test Lost Write Detection - 1');

1 row created.

MARCO@db12cb> commit;

Commit complete.

Now we can identify the block and check if the data is really in there.

SYS@db12cb> select file_name from dba_data_files where tablespace_name='MARCO';

FILE_NAME
------------------------------------------------------------------------------------------------------------------------
/u01/app/oracle/oradata/DB12CB/datafile/o1_mf_marco_d3llm6dd_.dbf

SYS@db12cb> select block_id, blocks from dba_extents where segment_name='TESTTABLE' and owner='MARCO';

  BLOCK_ID     BLOCKS
---------- ----------
       128          8

SYS@db12cb> alter system checkpoint;

System altered.
[oracle@oel6u4 ~]$ dd if=/u01/app/oracle/oradata/DB12CB/datafile/o1_mf_marco_d3llm6dd_.dbf of=myblock.v1 skip=129 count=1 bs=8192
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000162476 s, 50.4 MB/s
[oracle@oel6u4 ~]$ grep Detection myblock.v1
Binary file myblock.v1 matches

Ok, the data is in that block. In the same way I can now check if the DML was successfully applied on the standby.

SYS@db12ca> alter system flush buffer_cache;

System altered.

SYS@db12ca> select name from v$datafile where name like '%marco%';

NAME
--------------------------------------------------------------------------------
/u01/app/oracle/oradata/DB12CA/datafile/o1_mf_marco_d3llm8nt_.dbf
[oracle@oel6u4 ~]$ dd if=/u01/app/oracle/oradata/DB12CA/datafile/o1_mf_marco_d3llm8nt_.dbf of=sbblock.v1 skip=129 count=1 bs=8192
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000662024 s, 12.4 MB/s
[oracle@oel6u4 ~]$ grep Detection sbblock.v1
Binary file sbblock.v1 matches

So everything is fine until now as it should be.
I will now insert another row into the test table, force that change to be written to disk and then clear the buffer cache.

MARCO@db12cb> insert into testtable values (2, 'Oh my god!');

1 row created.

MARCO@db12cb> commit;

Commit complete.

MARCO@db12cb>

MARCO@db12cb> conn / as sysdba
Connected.
SYS@db12cb> alter system checkpoint;

System altered.

SYS@db12cb> alter system flush buffer_cache;

System altered.

Again, check if it was written to disk.

[oracle@oel6u4 ~]$ dd if=/u01/app/oracle/oradata/DB12CB/datafile/o1_mf_marco_d3llm6dd_.dbf of=myblock.v2 skip=129 count=1 bs=8192
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000318304 s, 25.7 MB/s
[oracle@oel6u4 ~]$ grep Detection myblock.v2
Binary file myblock.v2 matches
[oracle@oel6u4 ~]$ grep god myblock.v2
Binary file myblock.v2 matches

Both values that I inserted are on disk now. Just to make sure everything is ok, I check the block on the standby.

[oracle@oel6u4 ~]$ dd if=/u01/app/oracle/oradata/DB12CA/datafile/o1_mf_marco_d3llm8nt_.dbf of=sbblock.v2 skip=129 count=1 bs=8192
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000162124 s, 50.5 MB/s
[oracle@oel6u4 ~]$ grep Detection sbblock.v2
Binary file sbblock.v2 matches
[oracle@oel6u4 ~]$ grep god sbblock.v2
Binary file sbblock.v2 matches

So far, so good. Now comes the funny part. I will simulate a lost write by just putting my first extracted block back in the datafile.

[oracle@oel6u4 ~]$ dd if=myblock.v1 of=/u01/app/oracle/oradata/DB12CB/datafile/o1_mf_marco_d3llm6dd_.dbf seek=129 count=1 bs=8192 conv=notrunc
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000154517 s, 53.0 MB/s

Now let us query the test table and see what’s happening.

MARCO@db12cb> select * from testtable;

        ID
----------
TXT
--------------------------------------------------------------------------------
         1
Test Lost Write Detection - 1

No error, no waring, just the result. But the result set obviously lacks the row from the second insert. And as the block is completely intact and not corrupted, there is no need to raise any error.
So now it is time to do another INSERT.

MARCO@db12cb> insert into testtable values (3, 'Inconsistency!');

1 row created.

That is the point where it comes to light. The redo apply of the standby database detects a redo record which does not match the data block that itself has. It has no other chance as to stop recovery and raise an error in the alert.log.

2016-11-26 09:52:02.752000 +01:00
ERROR: ORA-00600: internal error code, arguments: [3020] recovery detected a data block with invalid SCN raised at location:kcbr_media_ap_1
Checker run found 1 new persistent data failures
Errors in file /u01/app/oracle/diag/rdbms/db12ca/db12ca/trace/db12ca_pr02_2466.trc  (incident=2705):
ORA-00600: internal error code, arguments: [3020], [2], [129], [8388737], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 129, file offset is 1056768 bytes)
ORA-10564: tablespace MARCO
ORA-01110: data file 2: '/u01/app/oracle/oradata/DB12CA/datafile/o1_mf_marco_d3llm8nt_.dbf'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 93368
2016-11-26 09:52:03.882000 +01:00
Incident details in: /u01/app/oracle/diag/rdbms/db12ca/db12ca/incident/incdir_2705/db12ca_pr02_2466_i2705.trc

Beside that, the primary is still running fine, accepts changes, commits and is just doing what a database is supposed to do. This is very unkind since the only way to recover from such a situation is doing a failover to the standby and lose all changes that happened after the change to damaged block. And this can be a lot.

Scenario #2: Lost Write Detection enabled

I enable it by simply setting the parameter to typical on both instances.

SYS@db12ca> alter system set db_lost_write_protect=typical;

System altered.
SYS@db12cb> alter system set db_lost_write_protect=typical;

System altered.

This parameter forces the database to record the SCN of all blocks that it reads from disk to the redo stream. The standby database can use this information to compare the recorded SCN from the redo stream to the actual SCN of the block at the standby site. If there is a difference, it can report a lost write.

Now I walk through the same steps as above. But this time, after simulating the lost write, I simply query the table.

MARCO@db12cb> select * from testtable;

        ID
----------
TXT
--------------------------------------------------------------------------------
         1
Test Lost Write Detection - 1

The SELECT succeeds, but the alert.log of the primary reports the following error.

2016-11-26 10:40:47.143000 +01:00
DMON: A primary database lost write was reported by standby database db12ca. Please look at the alert and DRC logs of the standby database db12ca to see more information about the lost write.

The standby’s alert.log now reports an ORA-752 instead of an ORA-600.

No redo at or after SCN 3448159 can be used for recovery.
PR02: Primary database lost write detected by standby database db12ca
BLOCK THAT LOST WRITE 129, FILE 2, TABLESPACE# 7
The block read during the normal successful database operation had SCN 3346737 (0x0000.00331131) seq 1 (0x01)
ERROR: ORA-00752 detected lost write on primary
Slave exiting with ORA-752 exception
Errors in file /u01/app/oracle/diag/rdbms/db12ca/db12ca/trace/db12ca_pr02_2924.trc:
ORA-00752: recovery detected a lost write of a data block
ORA-10567: Redo is inconsistent with data block (file# 2, block# 129, file offset is 1056768 bytes)
ORA-10564: tablespace MARCO
ORA-01110: data file 2: '/u01/app/oracle/oradata/DB12CA/datafile/o1_mf_marco_d3lnpn8n_.dbf'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 93369
Recovery Slave PR02 previously exited with exception 752
MRP0: Background Media Recovery terminated with error 448
Errors in file /u01/app/oracle/diag/rdbms/db12ca/db12ca/trace/db12ca_pr00_2919.trc:
ORA-00448: normal completion of background process

Recovering from a lost write

As in scenario #1, the only way to work around this error is to failover to the standby database.

[oracle@oel6u4 ~]$ dgmgrl
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys@db12ca
Password:
Connected as SYSDBA.
DGMGRL> failover to db12ca immediate
Performing failover NOW, please wait...
Failover succeeded, new primary is "db12ca"

Now I can query my test table at the new primary.

SYS@db12ca> select * from marco.testtable;

        ID TXT
---------- ------------------------------
         1 Test Lost Write Detection - 1
         2 Oh my god!

I now need to re-create the old primary. Reinstate using Flashback Database will not work. The steps will be basically these:

  • remove database from configuration
  • recreate the database using duplicate
  • add database back to the configuration

A lot of effort for such a “small” failure….

Conclusion

Enabling lost write detection is crucial in a Data Guard setup. Lost writes are detected at read time which allows to perform recovery steps much earlier than without it. Nevertheless, lost writes should not occur. If it does occur, something really bad is going on in your environment and you need to investigate the root cause of the lost write.
That’s it, basically. I hope it makes things a little more clear.

Further reading

Resolving ORA-752 or ORA-600 [3020] During Standby Recovery (Doc ID 1265884.1)
Oracle Docs – Database Reference: DB_LOST_WRITE_PROTECT