RMAN: Archivelogs, the UNTIL clause – expectations vs. reality

Sorry for the lengthy title, I just couldn’t put it shorter. So let’s start the story. Yesterday I was implementing a desaster recovery solution. The solution consists of a database copy, a process that transfers archivelogs from the original database and a process that applies the archivelogs to the copy. Dataguard would do that much better, but the system is running on Standard Edition 2, so no Dataguard at all. The solution worked quite well until I wanted to introduce an apply delay yesterday. That would mean, I still ship archivelogs on a regular basis but apply them only until a specific point in time. Assuming all archivelogs are present at the DR site, that is quite easy using RMAN:

RMAN> recover database until time 'sysdate-%s/1440';

The “%s” is filled with the desired apply delay in minutes. The fun starts with the housekeeping of the archivelogs. My asumption was, I can use the UNTIL clause to get rid of all the archivelogs, that are no longer required for recovery. Let’s first check the available archivelogs:

QL> select sequence#, first_time, next_time, first_change#, next_change# from v$archived_log where status='A' and first_change# >= 8582498175253 order by 1;

                 SEQUENCE# FIRST_TIME          NEXT_TIME                        FIRST_CHANGE#               NEXT_CHANGE#
-------------------------- ------------------- ------------------- -------------------------- --------------------------
                     16865 2024-04-08 10:26:16 2024-04-08 10:33:33              8582498176626              8582498177307

Just one archivelog. Let’s further asume, I already recovered until 10:30. So the archivelog is still needed for recovery since it ranges from 10:26 till 10:33. Now I run the RMAN DELETE command to remove the archivelogs, that are no longer needed. That should be none in this case. Of cause I use NOPROMPT FORCE because the command will be run by an automatic script.

MAN> delete  force noprompt archivelog until time "to_date('2024-04-08 10:30:00', 'yyyy-mm-dd hh24:mi:ss')";

allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=2 device type=DISK
archived log file name=/u01/app/oracle/standby/o1_mf_1_16865_m17byx6b_.arc thread=1 sequence=16865

But see, the archivelog was removed even though its next time is beyond my given timestamp. So I crosscheck that using a SCN in between. Of cause I made the archivelog available again for testing this testing purpose.

RMAN> delete  archivelog until scn 8582498177000;

allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=2 device type=DISK
archived log file name=/u01/app/oracle/standby/o1_mf_1_16865_m17byx6b_.arc thread=1 sequence=16865

And, surprise surprise, it also removed my archivelog. So what shall I do? Maybe reading the docs sheds some light on this issue.

Oracle Backup and Recovery Reference – archivelogRecordSpecifier

The table clearly states, that UNTIL SCN and UNTIL TIME both should use the FIRST_TIME to identify the archivelogs. If the FIRST_TIME “is less or equal” to the given timestamp, the archivelog falls into the range. So it is expected behaviour. The logic is good for restoring archivelogs because I will get all redo information that fall into the range. It is not as good for removing archivelogs because I will loose a little more redo information than specified.

What do we learn from that? Asumptions are a good starting point. Reading the docs is the next logical step. But if you want to make sure, things work as you want it to be, test it.

Update 2024-04-10

Changed the last paragraphs as I got really confused during testing and changed my understanding of the docs and the real behaviour several times. But that’s one thing, why writing is a good thing. You start thinking more deeply about things and do not stop thinking about it for some time.