RMAN recovery window over redundancy

One of my favorite topics around the Oracle Database is backup and recovery because it is so essential for successfully operating a database landscape. The topic today is a basic setting in RMAN, that controls the lifetime of backups, the “retention policy”. Let’s get into it.

When planning a backup strategy, there are two major values, that should be taken into account. One is the Recovery Time Objective (RTO) which is the amount of time spent to bring a system back to life after an incident. The second is the Recovery Point Objective (RPO) which represents the point in time to which a system shall be restored to get back to operation. The latter one is mainly influencing our retention policy. Imagine the following situation, a major application update takes place which does a lot of DDL and DML to the database because there are so many cool new features in the new release of that application. But for some reason the update fails when it was halfway through. Of cause it cannot be rolled back easily because of the DDL that took place. So the DBA is asked to restore the database to the time just before the update began.

That is, where the fun part starts. Basically it should be an easy thing using RMAN. Set the point in time, restore the controlfile, restore the database, roll it forward until the point in time and then open the database. So let’s go ahead.

[oracle@vm160 ~]$ rman target /

Recovery Manager: Release 19.0.0.0.0 - Production on Tue Nov 7 08:33:41 2023
Version 19.18.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL19 (DBID=422142402)

RMAN> startup nomount force

Oracle instance started

Total System Global Area    2147482320 bytes

Fixed Size                     9165520 bytes
Variable Size               1342177280 bytes
Database Buffers             788529152 bytes
Redo Buffers                   7610368 bytes

RMAN> set dbid=422142402

executing command: SET DBID

RMAN> run {
2> set until time "to_date('06.11.2023 04:00:00', 'dd.mm.yyyy hh24:mi:ss')";
3> restore controlfile from autobackup;
4> alter database mount;
5> restore database;
6> recover database;
7> }

executing command: SET until clause

Starting restore at 2023-11-07 08:35:08
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=237 device type=DISK

recovery area destination: /u01/app/oracle/fra
database name (or database unique name) used for search: ORCL19
channel ORA_DISK_1: no AUTOBACKUPS found in the recovery area
channel ORA_DISK_1: looking for AUTOBACKUP on day: 20231106
channel ORA_DISK_1: looking for AUTOBACKUP on day: 20231105
channel ORA_DISK_1: looking for AUTOBACKUP on day: 20231104
channel ORA_DISK_1: looking for AUTOBACKUP on day: 20231103
channel ORA_DISK_1: looking for AUTOBACKUP on day: 20231102
channel ORA_DISK_1: looking for AUTOBACKUP on day: 20231101
channel ORA_DISK_1: looking for AUTOBACKUP on day: 20231031
channel ORA_DISK_1: no AUTOBACKUP in 7 days found
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 11/07/2023 08:35:11
RMAN-06172: no AUTOBACKUP found or specified handle is not a valid copy or piece

What? Why don’t I have any backup of my controlfile? Let’s check that.

RMAN> list backup of controlfile summary;


List of Backups
===============
Key     TY LV S Device Type Completion Time     #Pieces #Copies Compressed Tag
------- -- -- - ----------- ------------------- ------- ------- ---------- ---
68      B  F  A DISK        2023-11-07 08:44:01 1       1       NO         TAG20231107T084400

Ok, there is a controlfile backup, but just one. So what are my RMAN settings? Can I see something?

RMAN> show all;

using target database control file instead of recovery catalog
RMAN configuration parameters for database with db_unique_name ORCL19 are:
CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default
CONFIGURE BACKUP OPTIMIZATION OFF; # default
CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default
CONFIGURE CONTROLFILE AUTOBACKUP ON; # default
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F'; # default
CONFIGURE DEVICE TYPE DISK PARALLELISM 1 BACKUP TYPE TO BACKUPSET; # default
CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE MAXSETSIZE TO UNLIMITED; # default
CONFIGURE ENCRYPTION FOR DATABASE OFF; # default
CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default
CONFIGURE COMPRESSION ALGORITHM 'BASIC' AS OF RELEASE 'DEFAULT' OPTIMIZE FOR LOAD TRUE ; # default
CONFIGURE RMAN OUTPUT TO KEEP FOR 7 DAYS; # default
CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/u01/app/oracle/product/db/19/dbs/snapcf_orcl19.f'; # default

The thing is the redundancy=1 as the retention policy. A redundancy of 1 (in words: one) means just what it says, RMAN keeps exactly one backup of each file it backs up. And often I see backup strategies relying highly on redundancy. Say I want to keep 2 weeks of backups for my database. A full backup takes place every sunday, so I set retention policy to “redundancy 2”. That way, RMAN keeps 2 full backups of my database which, according to the backup schedule, will be good for 2 weeks of backups. No, it is not. Why not? For a couple of reasons. First, as shown, the redundancy applies to all types of backups, also controlfile bakups etc. First, to be able to restore the database to the point 2 weeks ago, we’ll need a controlfile backup from that time. And there might be many of those backups. Second, if we change the backup schedule to backup the database twice a week, we change the retention to about one week implicitely. That’s not what I want it to be.

So how do I come around this? The answer is quite simple: Change the retention policy to be a recovery window rather than redundancy.

RMAN> CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 14 DAYS;;

new RMAN configuration parameters:
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 14 DAYS;
new RMAN configuration parameters are successfully stored

With that setting the previous example would have run without any errors. There are many backups of the controlfile:

RMAN> list backup of controlfile summary;


List of Backups
===============
Key     TY LV S Device Type Completion Time     #Pieces #Copies Compressed Tag
------- -- -- - ----------- ------------------- ------- ------- ---------- ---
17      B  F  A DISK        2023-02-24 13:54:24 1       1       NO         TAG20230224T135422
24      B  F  A DISK        2023-10-27 09:13:43 1       1       NO         TAG20231027T091343
26      B  F  A DISK        2023-10-27 09:16:32 1       1       NO         TAG20231027T091632
28      B  F  A DISK        2023-10-28 02:00:06 1       1       NO         TAG20231028T020006
35      B  F  A DISK        2023-10-29 01:00:47 1       1       NO         TAG20231029T010046
37      B  F  A DISK        2023-10-29 02:00:05 1       1       NO         TAG20231029T020005
39      B  F  A DISK        2023-10-30 02:00:08 1       1       NO         TAG20231030T020008
41      B  F  A DISK        2023-10-31 02:00:05 1       1       NO         TAG20231031T020005
43      B  F  A DISK        2023-11-01 02:00:07 1       1       NO         TAG20231101T020007
45      B  F  A DISK        2023-11-02 02:00:06 1       1       NO         TAG20231102T020006
46      B  F  A DISK        2023-11-02 13:28:47 1       1       NO         TAG20231102T132847
47      B  F  A DISK        2023-11-02 13:44:19 1       1       NO         TAG20231102T134419
48      B  F  A DISK        2023-11-02 13:47:53 1       1       NO         TAG20231102T134753
49      B  F  A DISK        2023-11-02 13:57:21 1       1       NO         TAG20231102T135721
52      B  F  A DISK        2023-11-03 02:00:08 1       1       NO         TAG20231103T020008
54      B  F  A DISK        2023-11-04 02:00:07 1       1       NO         TAG20231104T020006
61      B  F  A DISK        2023-11-05 01:00:37 1       1       NO         TAG20231105T010037
63      B  F  A DISK        2023-11-05 02:00:06 1       1       NO         TAG20231105T020006
65      B  F  A DISK        2023-11-06 02:00:05 1       1       NO         TAG20231106T020005
66      B  F  A DISK        2023-11-07 08:24:25 1       1       NO         TAG20231107T082424
67      B  F  A DISK        2023-11-07 02:00:06 1       1       NO         TAG20231107T020006
68      B  F  A DISK        2023-11-07 08:44:01 1       1       NO         TAG20231107T084400

With all these backups I can easily restore the database to the given point in time.

[oracle@vm160 ~]$ rman target /

Recovery Manager: Release 19.0.0.0.0 - Production on Tue Nov 7 08:35:59 2023
Version 19.18.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL19 (not mounted)

RMAN> set dbid=422142402

executing command: SET DBID

RMAN> run {
2> set until time "to_date('06.11.2023 04:00:00', 'dd.mm.yyyy hh24:mi:ss')";
3> restore controlfile from autobackup;
4> alter database mount;
5> restore database;
6> recover database;
7> }

executing command: SET until clause

Starting restore at 2023-11-07 08:36:51
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=20 device type=DISK

recovery area destination: /u01/app/oracle/fra
database name (or database unique name) used for search: ORCL19
channel ORA_DISK_1: AUTOBACKUP /u01/app/oracle/fra/ORCL19/autobackup/2023_11_06/o1_mf_s_1152151205_lnjgnooh_.bkp found in the recovery area
channel ORA_DISK_1: looking for AUTOBACKUP on day: 20231106
channel ORA_DISK_1: restoring control file from AUTOBACKUP /u01/app/oracle/fra/ORCL19/autobackup/2023_11_06/o1_mf_s_1152151205_lnjgnooh_.bkp
channel ORA_DISK_1: control file restore from AUTOBACKUP complete
output file name=/u01/app/oracle/oradata/ORCL19/controlfile/o1_mf_kq6lhlbv_.ctl
output file name=/u01/app/oracle/fra/ORCL19/controlfile/o1_mf_kq6lhlmd_.ctl
Finished restore at 2023-11-07 08:36:53

released channel: ORA_DISK_1
Statement processed

Starting restore at 2023-11-07 08:36:58
Starting implicit crosscheck backup at 2023-11-07 08:36:58
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=20 device type=DISK
Crosschecked 48 objects
Finished implicit crosscheck backup at 2023-11-07 08:36:59

Starting implicit crosscheck copy at 2023-11-07 08:36:59
using channel ORA_DISK_1
Finished implicit crosscheck copy at 2023-11-07 08:36:59

searching for all files in the recovery area
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: /u01/app/oracle/fra/ORCL19/autobackup/2023_11_06/o1_mf_s_1152151205_lnjgnooh_.bkp
File Name: /u01/app/oracle/fra/ORCL19/autobackup/2023_11_07/o1_mf_s_1152260664_lnmsk92o_.bkp
File Name: /u01/app/oracle/fra/ORCL19/autobackup/2023_11_07/o1_mf_s_1152237606_lnm30pow_.bkp

using channel ORA_DISK_1

skipping datafile 5; already restored to file /u01/app/oracle/oradata/ORCL19/datafile/o1_mf_system_kq6m8z4c_.dbf
skipping datafile 6; already restored to file /u01/app/oracle/oradata/ORCL19/datafile/o1_mf_sysaux_kq6m8z7g_.dbf
skipping datafile 8; already restored to file /u01/app/oracle/oradata/ORCL19/datafile/o1_mf_undotbs1_kq6m8zgv_.dbf
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00001 to /u01/app/oracle/oradata/ORCL19/datafile/o1_mf_system_kq6l8vf4_.dbf
channel ORA_DISK_1: restoring datafile 00003 to /u01/app/oracle/oradata/ORCL19/datafile/o1_mf_sysaux_kq6lbyd0_.dbf
channel ORA_DISK_1: restoring datafile 00004 to /u01/app/oracle/oradata/ORCL19/datafile/o1_mf_undotbs1_kq6ld1qb_.dbf
channel ORA_DISK_1: restoring datafile 00007 to /u01/app/oracle/oradata/ORCL19/datafile/o1_mf_users_kq6ld4xm_.dbf
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/backup/orcl19/ORCL19_20231105_data_1s2am3om_1_1
channel ORA_DISK_1: piece handle=/u01/app/oracle/backup/orcl19/ORCL19_20231105_data_1s2am3om_1_1 tag=TAG20231105T010006
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:15
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00019 to /u01/app/oracle/oradata/ORCL19/ED7F830814134AADE053A024100A5D08/datafile/system_umbenannt.dbf
channel ORA_DISK_1: restoring datafile 00020 to /u01/app/oracle/oradata/ORCL19/ED7F830814134AADE053A024100A5D08/datafile/o1_mf_sysaux_kzgxy7n2_.dbf
channel ORA_DISK_1: restoring datafile 00021 to /u01/app/oracle/oradata/ORCL19/ED7F830814134AADE053A024100A5D08/datafile/o1_mf_undotbs1_kzgxy7n4_.dbf
channel ORA_DISK_1: restoring datafile 00022 to /u01/app/oracle/oradata/ORCL19/ED7F830814134AADE053A024100A5D08/datafile/o1_mf_users_kzgxy7n5_.dbf
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/backup/orcl19/ORCL19_20231105_data_1t2am3p6_1_1
channel ORA_DISK_1: piece handle=/u01/app/oracle/backup/orcl19/ORCL19_20231105_data_1t2am3p6_1_1 tag=TAG20231105T010006
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00015 to /u01/app/oracle/oradata/ORCL19/EF608A6519330511E053A024100A91E2/datafile/o1_mf_system_ks5pktn8_.dbf
channel ORA_DISK_1: restoring datafile 00016 to /u01/app/oracle/oradata/ORCL19/EF608A6519330511E053A024100A91E2/datafile/o1_mf_sysaux_ks5pktn9_.dbf
channel ORA_DISK_1: restoring datafile 00017 to /u01/app/oracle/oradata/ORCL19/EF608A6519330511E053A024100A91E2/datafile/o1_mf_undotbs1_ks5pktnb_.dbf
channel ORA_DISK_1: restoring datafile 00023 to /u01/app/oracle/oradata/ORCL19/EF608A6519330511E053A024100A91E2/datafile/o1_mf_users_ln761vy3_.dbf
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/backup/orcl19/ORCL19_20231105_data_1u2am3pd_1_1
channel ORA_DISK_1: piece handle=/u01/app/oracle/backup/orcl19/ORCL19_20231105_data_1u2am3pd_1_1 tag=TAG20231105T010006
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07
Finished restore at 2023-11-07 08:37:29

Starting recover at 2023-11-07 08:37:29
using channel ORA_DISK_1

starting media recovery

archived log for thread 1 with sequence 7 is already on disk as file /u01/app/oracle/fra/ORCL19/onlinelog/o1_mf_1_kq6lkxbr_.log
archived log for thread 1 with sequence 8 is already on disk as file /u01/app/oracle/fra/ORCL19/onlinelog/o1_mf_2_kq6lk9px_.log
channel ORA_DISK_1: starting archived log restore to default destination
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=4
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/backup/orcl19/ORCL19_20231105_arch_202am3pk_1_1
channel ORA_DISK_1: piece handle=/u01/app/oracle/backup/orcl19/ORCL19_20231105_arch_202am3pk_1_1 tag=TAG20231105T010035
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
archived log file name=/u01/app/oracle/fra/ORCL19/archivelog/2023_11_07/o1_mf_1_4_lnmt9trc_.arc thread=1 sequence=4
channel default: deleting archived log(s)
archived log file name=/u01/app/oracle/fra/ORCL19/archivelog/2023_11_07/o1_mf_1_4_lnmt9trc_.arc RECID=268 STAMP=1152261450
channel ORA_DISK_1: starting archived log restore to default destination
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=5
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/backup/orcl19/ORCL19_20231105_arch_222am794_1_1
channel ORA_DISK_1: piece handle=/u01/app/oracle/backup/orcl19/ORCL19_20231105_arch_222am794_1_1 tag=TAG20231105T020004
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
archived log file name=/u01/app/oracle/fra/ORCL19/archivelog/2023_11_07/o1_mf_1_5_lnmt9w16_.arc thread=1 sequence=5
channel default: deleting archived log(s)
archived log file name=/u01/app/oracle/fra/ORCL19/archivelog/2023_11_07/o1_mf_1_5_lnmt9w16_.arc RECID=269 STAMP=1152261452
channel ORA_DISK_1: starting archived log restore to default destination
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=6
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/backup/orcl19/ORCL19_20231106_arch_242aorl4_1_1
channel ORA_DISK_1: piece handle=/u01/app/oracle/backup/orcl19/ORCL19_20231106_arch_242aorl4_1_1 tag=TAG20231106T020004
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
archived log file name=/u01/app/oracle/fra/ORCL19/archivelog/2023_11_07/o1_mf_1_6_lnmt9yvg_.arc thread=1 sequence=6
channel default: deleting archived log(s)
archived log file name=/u01/app/oracle/fra/ORCL19/archivelog/2023_11_07/o1_mf_1_6_lnmt9yvg_.arc RECID=270 STAMP=1152261455
archived log file name=/u01/app/oracle/fra/ORCL19/onlinelog/o1_mf_1_kq6lkxbr_.log thread=1 sequence=7
archived log file name=/u01/app/oracle/fra/ORCL19/onlinelog/o1_mf_2_kq6lk9px_.log thread=1 sequence=8
media recovery complete, elapsed time: 00:00:07
Finished recover at 2023-11-07 08:37:42

Setting the RMAN retention policy to any redundancy is a really bad idea. It might work out for the backup of the datafiles, but it will definitely cause issues with the backups of controlfiles.

So typically we have a requirement for the RTO. We should always use the given value for setting the RMAN retention policy to a recovery windows so that RMAN can keep track of all the necessary backups and is able to keep them as long as they might be required.