RMAN delete obsolete but keep archivelogs

When doing RMAN backups we typically have one strategy and one policy. But the deletion policy always considers not only backups and copies but also archivelogs. In some cases one might want to keep archivelogs for a longer period of time, for instance if you are running a non-Dataguard Standby database using DBVisit. Let’s say we want to keep two generations of backups, we would do the following:

RMAN> CONFIGURE RETENTION POLICY TO REDUNDANCY 2;

new RMAN configuration parameters:
CONFIGURE RETENTION POLICY TO REDUNDANCY 2;
new RMAN configuration parameters are successfully stored

Now let’s see what backups we have:

RMAN> list backup of database summary;


List of Backups
===============
Key     TY LV S Device Type Completion Time     #Pieces #Copies Compressed Tag
------- -- -- - ----------- ------------------- ------- ------- ---------- ---
23      B  F  A DISK        22.12.2016 09:19:14 1       1       NO         TAG20161222T091750
27      B  F  A DISK        22.12.2016 10:02:33 1       1       NO         TAG20161222T100145
28      B  F  A DISK        22.12.2016 10:02:57 1       1       NO         TAG20161222T100145
29      B  F  A DISK        22.12.2016 10:03:22 1       1       NO         TAG20161222T100145
33      B  F  A DISK        22.12.2016 10:17:14 1       1       NO         TAG20161222T101632
34      B  F  A DISK        22.12.2016 10:17:32 1       1       NO         TAG20161222T101632
35      B  F  A DISK        22.12.2016 10:18:00 1       1       NO         TAG20161222T101632


RMAN> list backup of archivelog all summary;


List of Backups
===============
Key     TY LV S Device Type Completion Time     #Pieces #Copies Compressed Tag
------- -- -- - ----------- ------------------- ------- ------- ---------- ---
24      B  A  A DISK        22.12.2016 09:19:30 1       1       NO         TAG20161222T091930
26      B  A  A DISK        22.12.2016 10:01:34 1       1       NO         TAG20161222T100127
30      B  A  A DISK        22.12.2016 10:03:35 1       1       NO         TAG20161222T100335
32      B  A  A DISK        22.12.2016 10:16:24 1       1       NO         TAG20161222T101615
36      B  A  A DISK        22.12.2016 10:18:16 1       1       NO         TAG20161222T101815

So we have two database backups and corresponding backups of archivelogs. And what archivelogs are still on disk?

RMAN> list archivelog all;

List of Archived Log Copies for database with db_unique_name OLTP
=====================================================================

Key     Thrd Seq     S Low Time
------- ---- ------- - -------------------
46      1    15      A 21.12.2016 12:56:40
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_15_d5q709b5_.arc

41      1    16      A 22.12.2016 09:17:31
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_16_d5q709x1_.arc

43      1    17      A 22.12.2016 09:19:29
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_17_d5q709oq_.arc

35      1    18      A 22.12.2016 10:01:26
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_18_d5q5q65p_.arc

36      1    19      A 22.12.2016 10:03:33
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_19_d5q6gtmj_.arc

38      1    20      A 22.12.2016 10:16:10
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_20_d5q6lm3q_.arc

45      2    11      A 21.12.2016 12:56:30
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_11_d5q709gs_.arc

42      2    12      A 22.12.2016 09:17:20
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_12_d5q709xo_.arc

44      2    13      A 22.12.2016 09:19:14
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_13_d5q709ln_.arc

40      2    14      A 22.12.2016 10:01:12
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_14_d5q709pq_.arc

37      2    15      A 22.12.2016 10:03:18
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_15_d5q6gg9k_.arc

39      2    16      A 22.12.2016 10:15:58
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_16_d5q6l6jm_.arc

That’s all archivelogs that would be needed for recovery of the older database backup. Now let’s see what is obsolete:

RMAN> report obsolete;

RMAN retention policy will be applied to the command
RMAN retention policy is set to redundancy 2
Report of obsolete backups and copies
Type                 Key    Completion Time    Filename/Handle
-------------------- ------ ------------------ --------------------
Backup Set           23     22.12.2016 09:19:16
  Backup Piece       23     22.12.2016 09:19:16 /u01/app/oracle/fra/OLTP/43C6AA13C2390666E0538D24100A09EF/backupset/2016_12_22/o1_mf_nnndf_TAG20161222T091750_d5q33orv_.bkp
Backup Set           24     22.12.2016 09:19:30
  Backup Piece       24     22.12.2016 09:19:30 /u01/app/oracle/fra/OLTP/backupset/2016_12_22/o1_mf_annnn_TAG20161222T091930_d5q34ljw_.bkp
Backup Set           25     22.12.2016 09:19:34
  Backup Piece       25     22.12.2016 09:19:34 /u01/app/oracle/fra/OLTP/autobackup/2016_12_22/o1_mf_s_931252772_d5q34ol5_.bkp
Backup Set           26     22.12.2016 10:01:36
  Backup Piece       26     22.12.2016 10:01:36 /u01/app/oracle/fra/OLTP/backupset/2016_12_22/o1_mf_annnn_TAG20161222T100127_d5q5m88s_.bkp
Archive Log          43     22.12.2016 10:25:30 /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_17_d5q709oq_.arc
Archive Log          41     22.12.2016 10:25:30 /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_16_d5q709x1_.arc
Archive Log          42     22.12.2016 10:25:30 /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_12_d5q709xo_.arc
Archive Log          44     22.12.2016 10:25:31 /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_13_d5q709ln_.arc
Archive Log          45     22.12.2016 10:25:35 /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_11_d5q709gs_.arc
Archive Log          46     22.12.2016 10:25:37 /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_15_d5q709b5_.arc

The oldest backup is obsolete which is fine since we already have three full backups. Also all the archivelogs are obsolete since they were already backuped up and can be restored from those backups in case of emergency. But what can we do if we want to keep the archivelogs on disk as long as possible? A “delete obsolete” would remove them along with the outdated database backup.
Let’s try to keep them using “change”:

RMAN> change archivelog all keep until time 'sysdate+3';

released channel: ORA_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=279 instance=oltp_1 device type=DISK
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of KEEP command at 12/22/2016 10:29:34
RMAN-06529: CHANGE ... KEEP not supported for ARCHIVELOG

So that is not an option. Maybe changing them to “unavailable” might do the job?

RMAN> change archivelog all unavailable;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of unavailable command on ORA_DISK_1 channel at 12/22/2016 10:30:43
ORA-19813: cannot have unavailable file /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_15_d5q709b5_.arc in DB_RECOVERY_FILE_DEST

It is not possible to mark archivelog files in FRA as unavailable. There must be another way to excempt the archivelogs from the retention policy. The policy applies to all files known to the database. So let’s get rid of the archivelogs:

RMAN> change archivelog all uncatalog;

uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_15_d5q709b5_.arc RECID=46 STAMP=931256737
uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_16_d5q709x1_.arc RECID=41 STAMP=931256730
uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_17_d5q709oq_.arc RECID=43 STAMP=931256730
uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_18_d5q5q65p_.arc RECID=35 STAMP=931255414
uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_19_d5q6gtmj_.arc RECID=36 STAMP=931256170
uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_20_d5q6lm3q_.arc RECID=38 STAMP=931256291
uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_11_d5q709gs_.arc RECID=45 STAMP=931256735
uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_12_d5q709xo_.arc RECID=42 STAMP=931256730
uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_13_d5q709ln_.arc RECID=44 STAMP=931256731
uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_14_d5q709pq_.arc RECID=40 STAMP=931256729
uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_15_d5q6gg9k_.arc RECID=37 STAMP=931256158
uncataloged archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_16_d5q6l6jm_.arc RECID=39 STAMP=931256278
Uncataloged 12 objects

That worked. The files are still there but the database does not know that anymore. Now we can apply the policy in the way we want it:

RMAN> report obsolete;

RMAN retention policy will be applied to the command
RMAN retention policy is set to redundancy 2
Report of obsolete backups and copies
Type                 Key    Completion Time    Filename/Handle
-------------------- ------ ------------------ --------------------
Backup Set           23     22.12.2016 09:19:16
  Backup Piece       23     22.12.2016 09:19:16 /u01/app/oracle/fra/OLTP/43C6AA13C2390666E0538D24100A09EF/backupset/2016_12_22/o1_mf_nnndf_TAG20161222T091750_d5q33orv_.bkp
Backup Set           24     22.12.2016 09:19:30
  Backup Piece       24     22.12.2016 09:19:30 /u01/app/oracle/fra/OLTP/backupset/2016_12_22/o1_mf_annnn_TAG20161222T091930_d5q34ljw_.bkp
Backup Set           25     22.12.2016 09:19:34
  Backup Piece       25     22.12.2016 09:19:34 /u01/app/oracle/fra/OLTP/autobackup/2016_12_22/o1_mf_s_931252772_d5q34ol5_.bkp
Backup Set           26     22.12.2016 10:01:36
  Backup Piece       26     22.12.2016 10:01:36 /u01/app/oracle/fra/OLTP/backupset/2016_12_22/o1_mf_annnn_TAG20161222T100127_d5q5m88s_.bkp

The oldest backup is still obsolete, but now archivelogs anymore. That’s want we wannted to achieve. I can now remove the old backup:

RMAN> delete noprompt obsolete;

RMAN retention policy will be applied to the command
RMAN retention policy is set to redundancy 2
using channel ORA_DISK_1
Deleting the following obsolete backups and copies:
Type                 Key    Completion Time    Filename/Handle
-------------------- ------ ------------------ --------------------
Backup Set           23     22.12.2016 09:19:16
  Backup Piece       23     22.12.2016 09:19:16 /u01/app/oracle/fra/OLTP/43C6AA13C2390666E0538D24100A09EF/backupset/2016_12_22/o1_mf_nnndf_TAG20161222T091750_d5q33orv_.bkp
Backup Set           24     22.12.2016 09:19:30
  Backup Piece       24     22.12.2016 09:19:30 /u01/app/oracle/fra/OLTP/backupset/2016_12_22/o1_mf_annnn_TAG20161222T091930_d5q34ljw_.bkp
Backup Set           25     22.12.2016 09:19:34
  Backup Piece       25     22.12.2016 09:19:34 /u01/app/oracle/fra/OLTP/autobackup/2016_12_22/o1_mf_s_931252772_d5q34ol5_.bkp
Backup Set           26     22.12.2016 10:01:36
  Backup Piece       26     22.12.2016 10:01:36 /u01/app/oracle/fra/OLTP/backupset/2016_12_22/o1_mf_annnn_TAG20161222T100127_d5q5m88s_.bkp
deleted backup piece
backup piece handle=/u01/app/oracle/fra/OLTP/43C6AA13C2390666E0538D24100A09EF/backupset/2016_12_22/o1_mf_nnndf_TAG20161222T091750_d5q33orv_.bkp RECID=23 STAMP=931252741
deleted backup piece
backup piece handle=/u01/app/oracle/fra/OLTP/backupset/2016_12_22/o1_mf_annnn_TAG20161222T091930_d5q34ljw_.bkp RECID=24 STAMP=931252770
deleted backup piece
backup piece handle=/u01/app/oracle/fra/OLTP/autobackup/2016_12_22/o1_mf_s_931252772_d5q34ol5_.bkp RECID=25 STAMP=931252773
deleted backup piece
backup piece handle=/u01/app/oracle/fra/OLTP/backupset/2016_12_22/o1_mf_annnn_TAG20161222T100127_d5q5m88s_.bkp RECID=26 STAMP=931255288
Deleted 4 objects

Afterwards I can re-register the archivelogs.

RMAN> catalog recovery area noprompt;

searching for all files in the recovery area

List of Files Unknown to the Database
=====================================
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_20_d5q6lm3q_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_16_d5q6l6jm_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_15_d5q709b5_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_11_d5q709gs_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_13_d5q709ln_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_17_d5q709oq_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_14_d5q709pq_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_18_d5q5q65p_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_19_d5q6gtmj_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_15_d5q6gg9k_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_16_d5q709x1_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_12_d5q709xo_.arc
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_20_d5q6lm3q_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_16_d5q6l6jm_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_15_d5q709b5_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_11_d5q709gs_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_13_d5q709ln_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_17_d5q709oq_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_14_d5q709pq_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_18_d5q5q65p_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_19_d5q6gtmj_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_15_d5q6gg9k_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_16_d5q709x1_.arc
File Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_12_d5q709xo_.arc

The archivelog files are back in the database catalog. Now I can handle them separately and apply any rule I want.

RMAN> delete noprompt archivelog until time 'sysdate-1/12';

released channel: ORA_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=279 instance=oltp_1 device type=DISK
List of Archived Log Copies for database with db_unique_name OLTP
=====================================================================

Key     Thrd Seq     S Low Time
------- ---- ------- - -------------------
49      1    15      A 21.12.2016 12:56:40
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_15_d5q709b5_.arc

50      2    11      A 21.12.2016 12:56:30
        Name: /u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_11_d5q709gs_.arc

deleted archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_1_15_d5q709b5_.arc RECID=49 STAMP=931257331
deleted archived log
archived log file name=/u01/app/oracle/fra/OLTP/archivelog/2016_12_22/o1_mf_2_11_d5q709gs_.arc RECID=50 STAMP=931257331
Deleted 2 objects

Not very straight forward, but it does the job.

Inspired by a comment on Twitter by Franck Pachot, the best way to keep a history of archivelogs is not to use “ALL” but “FROM”. So the approach the keep a history of three days would be the follwing:

RMAN> change archivelog from time 'sysdate-3' uncatalog;
RMAN> delete noprompt obsolete;
RMAN> catalog recovery area noprompt;

That way all archivelogs from the last three days are uncataloged. All other archivelogs are handled by the “delete obsolete” operation and there is no need to remove archivelogs manually.

Advertisements

How DB_LOST_WRITE_PROTECT works with Data Guard

Preface

Today I will blog again about a Data Guard topic. There are a couple of best practices out there which one should follow. One of these best practises is enabeling block checking and lost write protection. About the latter there are not many information out there. So that’s why I want to outline the concept and importance of this feature. Actually this post is inspired by a talk that I had during DOAG Conference 2016. I had a presentation about best practices in Data Guard and someone from the audience was asking how that lost write protection actually works.
Basically it is there to detect lost writes, as the parameter clearly states. That means, a write request to the disk was commited an the database is happy with that. But the write did not actually happen for whatever reason. So when the block will be read the next time, it is still in old state, any changed, deleted or added values are not included. The block itself is consistent, it is not corrupted. The DBA will not notice it since there is no error. An error will occur only when you restore the tablespace containing the block and then try to apply the redo stream. The recovery will detect a newer SCN in the redo stream which does not match the blocks SCN. That is the point where it gets tricky.

The test environment

My simple test cases run on a VirtualBox VM with OEL 6.7, Oracle Restart 12.1.0.2 and Oracle Database 12.1.0.2. Primary and Standby run on the same host.
DB_NAME: db12c
DB_UNIQUE_NAME: db12ca
DB_UNIQUE_NAME: db12cb
You will see the names in my SQL prompt to make things clear.

This is the current state of the system:

SYS@db12ca> show parameter lost

NAME                                 TYPE                              VALUE
------------------------------------ --------------------------------- ------------------------------
db_lost_write_protect                string                            NONE

SYS@db12ca> select database_role from v$database;

DATABASE_ROLE
------------------------------------------------
PHYSICAL STANDBY
SYS@db12cb> show parameter lost


NAME                                 TYPE                              VALUE
------------------------------------ --------------------------------- ------------------------------
db_lost_write_protect                string                            NONE


SYS@db12cb> select database_role from v$database;

DATABASE_ROLE
------------------------------------------------
PRIMARY

So “db12cb” is my primary and “db12ca” my standby instance. by the way, that’s why I gave them the suffix “a” and “b” because they may change roles over and over again.

For testing I create a separate tablespace with manual space management. This allows me to specify FREELISTS=1. Otherwise the changes to my data may end up in different blocks which is not what I want for my testing. Beside that, I create an user which I will use for testing and which gets the necessary grants.

SYS@db12cb> create tablespace marco datafile size 100m segment space management manual;

Tablespace created.

SYS@db12cb> create user marco identified by marco default tablespace marco quota unlimited on marco;

User created.

SYS@db12cb> grant create session to marco;

Grant succeeded.

SYS@db12cb> grant create table to marco;

Grant succeeded.

Scenario #1: No Lost Write Detection

The new user can now create a table and insert some data, so let’s do that.

SYS@db12cb> conn marco/marco
Connected.
MARCO@db12cb> create table testtable (id number, txt varchar2(100)) storage (freelists 1);

Table created.

MARCO@db12cb> insert into testtable values (1, 'Test Lost Write Detection - 1');

1 row created.

MARCO@db12cb> commit;

Commit complete.

Now we can identify the block and check if the data is really in there.

SYS@db12cb> select file_name from dba_data_files where tablespace_name='MARCO';

FILE_NAME
------------------------------------------------------------------------------------------------------------------------
/u01/app/oracle/oradata/DB12CB/datafile/o1_mf_marco_d3llm6dd_.dbf

SYS@db12cb> select block_id, blocks from dba_extents where segment_name='TESTTABLE' and owner='MARCO';

  BLOCK_ID     BLOCKS
---------- ----------
       128          8

SYS@db12cb> alter system checkpoint;

System altered.
[oracle@oel6u4 ~]$ dd if=/u01/app/oracle/oradata/DB12CB/datafile/o1_mf_marco_d3llm6dd_.dbf of=myblock.v1 skip=129 count=1 bs=8192
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000162476 s, 50.4 MB/s
[oracle@oel6u4 ~]$ grep Detection myblock.v1
Binary file myblock.v1 matches

Ok, the data is in that block. In the same way I can now check if the DML was successfully applied on the standby.

SYS@db12ca> alter system flush buffer_cache;

System altered.

SYS@db12ca> select name from v$datafile where name like '%marco%';

NAME
--------------------------------------------------------------------------------
/u01/app/oracle/oradata/DB12CA/datafile/o1_mf_marco_d3llm8nt_.dbf
[oracle@oel6u4 ~]$ dd if=/u01/app/oracle/oradata/DB12CA/datafile/o1_mf_marco_d3llm8nt_.dbf of=sbblock.v1 skip=129 count=1 bs=8192
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000662024 s, 12.4 MB/s
[oracle@oel6u4 ~]$ grep Detection sbblock.v1
Binary file sbblock.v1 matches

So everything is fine until now as it should be.
I will now insert another row into the test table, force that change to be written to disk and then clear the buffer cache.

MARCO@db12cb> insert into testtable values (2, 'Oh my god!');

1 row created.

MARCO@db12cb> commit;

Commit complete.

MARCO@db12cb>

MARCO@db12cb> conn / as sysdba
Connected.
SYS@db12cb> alter system checkpoint;

System altered.

SYS@db12cb> alter system flush buffer_cache;

System altered.

Again, check if it was written to disk.

[oracle@oel6u4 ~]$ dd if=/u01/app/oracle/oradata/DB12CB/datafile/o1_mf_marco_d3llm6dd_.dbf of=myblock.v2 skip=129 count=1 bs=8192
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000318304 s, 25.7 MB/s
[oracle@oel6u4 ~]$ grep Detection myblock.v2
Binary file myblock.v2 matches
[oracle@oel6u4 ~]$ grep god myblock.v2
Binary file myblock.v2 matches

Both values that I inserted are on disk now. Just to make sure everything is ok, I check the block on the standby.

[oracle@oel6u4 ~]$ dd if=/u01/app/oracle/oradata/DB12CA/datafile/o1_mf_marco_d3llm8nt_.dbf of=sbblock.v2 skip=129 count=1 bs=8192
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000162124 s, 50.5 MB/s
[oracle@oel6u4 ~]$ grep Detection sbblock.v2
Binary file sbblock.v2 matches
[oracle@oel6u4 ~]$ grep god sbblock.v2
Binary file sbblock.v2 matches

So far, so good. Now comes the funny part. I will simulate a lost write by just putting my first extracted block back in the datafile.

[oracle@oel6u4 ~]$ dd if=myblock.v1 of=/u01/app/oracle/oradata/DB12CB/datafile/o1_mf_marco_d3llm6dd_.dbf seek=129 count=1 bs=8192 conv=notrunc
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000154517 s, 53.0 MB/s

Now let us query the test table and see what’s happening.

MARCO@db12cb> select * from testtable;

        ID
----------
TXT
--------------------------------------------------------------------------------
         1
Test Lost Write Detection - 1

No error, no waring, just the result. But the result set obviously lacks the row from the second insert. And as the block is completely intact and not corrupted, there is no need to raise any error.
So now it is time to do another INSERT.

MARCO@db12cb> insert into testtable values (3, 'Inconsistency!');

1 row created.

That is the point where it comes to light. The redo apply of the standby database detects a redo record which does not match the data block that itself has. It has no other chance as to stop recovery and raise an error in the alert.log.

2016-11-26 09:52:02.752000 +01:00
ERROR: ORA-00600: internal error code, arguments: [3020] recovery detected a data block with invalid SCN raised at location:kcbr_media_ap_1
Checker run found 1 new persistent data failures
Errors in file /u01/app/oracle/diag/rdbms/db12ca/db12ca/trace/db12ca_pr02_2466.trc  (incident=2705):
ORA-00600: internal error code, arguments: [3020], [2], [129], [8388737], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 129, file offset is 1056768 bytes)
ORA-10564: tablespace MARCO
ORA-01110: data file 2: '/u01/app/oracle/oradata/DB12CA/datafile/o1_mf_marco_d3llm8nt_.dbf'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 93368
2016-11-26 09:52:03.882000 +01:00
Incident details in: /u01/app/oracle/diag/rdbms/db12ca/db12ca/incident/incdir_2705/db12ca_pr02_2466_i2705.trc

Beside that, the primary is still running fine, accepts changes, commits and is just doing what a database is supposed to do. This is very unkind since the only way to recover from such a situation is doing a failover to the standby and lose all changes that happened after the change to damaged block. And this can be a lot.

Scenario #2: Lost Write Detection enabled

I enable it by simply setting the parameter to typical on both instances.

SYS@db12ca> alter system set db_lost_write_protect=typical;

System altered.
SYS@db12cb> alter system set db_lost_write_protect=typical;

System altered.

This parameter forces the database to record the SCN of all blocks that it reads from disk to the redo stream. The standby database can use this information to compare the recorded SCN from the redo stream to the actual SCN of the block at the standby site. If there is a difference, it can report a lost write.

Now I walk through the same steps as above. But this time, after simulating the lost write, I simply query the table.

MARCO@db12cb> select * from testtable;

        ID
----------
TXT
--------------------------------------------------------------------------------
         1
Test Lost Write Detection - 1

The SELECT succeeds, but the alert.log of the primary reports the following error.

2016-11-26 10:40:47.143000 +01:00
DMON: A primary database lost write was reported by standby database db12ca. Please look at the alert and DRC logs of the standby database db12ca to see more information about the lost write.

The standby’s alert.log now reports an ORA-752 instead of an ORA-600.

No redo at or after SCN 3448159 can be used for recovery.
PR02: Primary database lost write detected by standby database db12ca
BLOCK THAT LOST WRITE 129, FILE 2, TABLESPACE# 7
The block read during the normal successful database operation had SCN 3346737 (0x0000.00331131) seq 1 (0x01)
ERROR: ORA-00752 detected lost write on primary
Slave exiting with ORA-752 exception
Errors in file /u01/app/oracle/diag/rdbms/db12ca/db12ca/trace/db12ca_pr02_2924.trc:
ORA-00752: recovery detected a lost write of a data block
ORA-10567: Redo is inconsistent with data block (file# 2, block# 129, file offset is 1056768 bytes)
ORA-10564: tablespace MARCO
ORA-01110: data file 2: '/u01/app/oracle/oradata/DB12CA/datafile/o1_mf_marco_d3lnpn8n_.dbf'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 93369
Recovery Slave PR02 previously exited with exception 752
MRP0: Background Media Recovery terminated with error 448
Errors in file /u01/app/oracle/diag/rdbms/db12ca/db12ca/trace/db12ca_pr00_2919.trc:
ORA-00448: normal completion of background process

Recovering from a lost write

As in scenario #1, the only way to work around this error is to failover to the standby database.

[oracle@oel6u4 ~]$ dgmgrl
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys@db12ca
Password:
Connected as SYSDBA.
DGMGRL> failover to db12ca immediate
Performing failover NOW, please wait...
Failover succeeded, new primary is "db12ca"

Now I can query my test table at the new primary.

SYS@db12ca> select * from marco.testtable;

        ID TXT
---------- ------------------------------
         1 Test Lost Write Detection - 1
         2 Oh my god!

I now need to re-create the old primary. Reinstate using Flashback Database will not work. The steps will be basically these:

  • remove database from configuration
  • recreate the database using duplicate
  • add database back to the configuration

A lot of effort for such a “small” failure….

Conclusion

Enabling lost write detection is crucial in a Data Guard setup. Lost writes are detected at read time which allows to perform recovery steps much earlier than without it. Nevertheless, lost writes should not occur. If it does occur, something really bad is going on in your environment and you need to investigate the root cause of the lost write.
That’s it, basically. I hope it makes things a little more clear.

Further reading

Resolving ORA-752 or ORA-600 [3020] During Standby Recovery (Doc ID 1265884.1)
Oracle Docs – Database Reference: DB_LOST_WRITE_PROTECT

OTN Appreciation Day: Flashback

Following Tim Hall’s suggestion, I will write about my favorite feature: Flashback.

The term “Flashback” covers different technologies. The most useful ones in my opinion are Flashback Query and Flashback Drop, they become very useful in case of human errors. Those errors are much more likely to occur than errors caused by hardware failures. We are all humans and we all make mistakes. Flashback technologies provide easy methods to recover from those errors in contrast to RMAN features for instance.

In the past I already wrote about that, you can the article here: Flashback – Recover without Recovery

ODA: Performance Impact due to Flashback Database

Here and there I am planning and actually doing migrations to the Oracle Database Appliance (ODA). I like this appliance because of it’s relative simplicity and ease of management. In most cases we connect two ODAs in different locations using DataGuard with the goal of maximizing availability. That means we have a primary 2-node-RAC on ODA-1 and another 2-node-RAC on ODA-2 which operates as standby.
In order to fulfill the requirements of availability and fault tolerance the databases are configured as “Maximum Availability”:

  • sufficient Standby Redologs configured – on the dedicated Redo SSDs
  • Force Logging mode
  • Flashback Database configured and activated – located in FRA per definition

In terms of availability and manageability this is quite good, but in terms of performance this turned out to be really bad sometimes. Look at this AWR snippets:

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:     88808 01-Sep-15 17:30:06       480       2.7
  End Snap:     88855 02-Sep-15 17:00:10       478       6.8
   Elapsed:            1,410.06 (mins)
   DB Time:            3,311.41 (mins)
Wait Classes by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                        Avg             Avg
                                        Total Wait     Wait   % DB   Active
Wait Class                  Waits       Time (sec)     (ms)   time Sessions
---------------- ---------------- ---------------- -------- ------ --------
System I/O             11,522,327          161,304       14   81.2      1.9
DB CPU                                     101,562            51.1      1.2
User I/O               19,956,840           88,663        4   44.6      1.0
Background Wait Events                 DB/Inst: XXX/xxx  Snaps: 88808-88855
-> ordered by wait time desc, waits desc (idle events last)
-> Only events with Total Wait Time (s) >= .001 are shown
-> %Timeouts: value of 0 indicates value was > .5%.  Value of null is truly 0

                                                             Avg                
                                        %Time Total Wait    wait    Waits   % bg
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
db file async I/O submit        969,898     0     90,151      93      0.3   46.5
db file parallel write        5,015,605     0     33,520       7      1.3   17.3
flashback log file write        617,118     0     30,295      49      0.2   15.6
Backup: MML write backup p   10,608,828     0     11,727       1      2.7    6.0
RMAN backup & recovery I/O      288,934     0      3,508      12      0.1    1.8

Remember, the flashback logs are stored inside FRA. And where does ODA place the FRA? On the slower part of the spinning disks. The system can not be faster than the slowest part in the chain. And in this configuration the user I/Os are influenced by flashback I/Os making things even worse.
So turn off flashback database in order to improve performance.

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:     89528 16-Sep-15 17:30:45       393       5.0
  End Snap:     89575 17-Sep-15 17:00:44       576       5.2
   Elapsed:            1,409.98 (mins)
   DB Time:            2,792.69 (mins)
Wait Classes by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                        Avg             Avg
                                        Total Wait     Wait   % DB   Active
Wait Class                  Waits       Time (sec)     (ms)   time Sessions
---------------- ---------------- ---------------- -------- ------ --------
System I/O              7,424,187           97,029       13   57.9      1.1
DB CPU                                      96,595            57.6      1.1
User I/O                8,191,268           63,769        8   38.1      0.8

Only half the System I/O than with flashback database enabled.

Background Wait Events                 DB/Inst: XXX/xxx  Snaps: 89528-89575
-> ordered by wait time desc, waits desc (idle events last)
-> Only events with Total Wait Time (s) >= .001 are shown
-> %Timeouts: value of 0 indicates value was > .5%.  Value of null is truly 0

                                                             Avg                
                                        %Time Total Wait    wait    Waits   % bg
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
db file async I/O submit        531,487     0     73,471     138      0.1   57.5
db file parallel write        2,384,365     0     15,169       6      0.7   11.9
Backup: MML write backup p   10,121,315     0     11,769       1      2.8    9.2
RMAN backup & recovery I/O      197,080     0      4,906      25      0.1    3.8

The background wait for “flashback log file write” completely gone.

Keep that in mind when using Flashback Database. This will also apply to other hardware than ODA. Be careful which disks you use for FRA/flashback logs.

Empty Backup Piece in RMAN

Yesterday I needed to do a comlete database restore since the DATA diskgroup was corrupted and could not be mounted:

SQL> alter diskgroup data mount;
alter diskgroup data mount
*
ERROR in line 1:
ORA-15096: lost disk write detected

This was due to an interruption of power supply and buffered writes that were not written to disk. So the only solution was to drop and re-create the diskgroup from scratch loosing all the data inside that DATA diskgroup.

SQL> drop diskgroup data force including contents;

Diskgroup dropped;

SQL> create diskgroup data redundancy external diks '\\.\ORCLDISKDATA0', '\\.\ORCLDISKDATA1';

Diskgroup created;

Now since every datafile was gone I needed to restore the database from scratch. Fortunately there was another diskgroup FRA which was not corrupted. And it held a copy of the controlfile. So I first started the instance with a simple pfile in order to restore the spfile afterwards.

db_name=mydb
compatible=11.1.0.0.0
control_files='+fra/mydb/controlfile.current.4711
sga_target=2g

Then go to RMAN and do the restore:

RMAN> startup mount;
RMAN> restore spfile to 'c:\temp\spfilemydb.ora';
RMAN> shutdown immediate;

Now I created a temporary pfile from that restored spfile, changed the control_files parameter to ‘+DATA’,’+FRA’ in order to get two brand-new controlfiles, and created an spfile now inside ASM which I then used to restart the instance.

Now follows the funny part. I started the database restore which took a long time but went fine.

RMAN> restore database;

Easy. Now the recovery. I checked the available backup pieces which were spread over tapes and repmovable disks. I found one backup piece which sounded like an archivelog backup but contained nothing according to the controlfile.

MAN> list backuppiece 'G:\ARC_883008933_S28127_P1';

List of Backup Pieces
BP Key  BS Key  Pc# Cp# Status      Device Type Piece Name
------- ------- --- --- ----------- ----------- ----------
13625   12924   1   1   AVAILABLE   DISK        G:\ARC_883008933_S28127_P1                                                                                    
														  
RMAN> list backupset 12924; 

List of Backup Sets
===================
															
BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------                
12924   2.48G      DISK        00:01:49     22.06.2015 00:17:22                        

BP Key: 13625   Status: AVAILABLE  Compressed: NO  Tag: ARC                    
Piece Name: G:\ARC_883008933_S28127_P1 

The file was really there and was exactly the size RMAN mentioned. But for whatever reason no content was recorded. And I was missing some archivelogs which I really really would have liked to have in place for recovery. That’s why I tried to look into that backup piece since I was assuming that it contains my missing archivelogs. Just re-cataloging is not sufficient.

RMAN> catalog start with 'G:\';

searching for all files that match the pattern G:\
no files found to be unknown to the database

So I just renamed the file to ‘G:\TARC_883008933_S28127_P1′ and tried the catalog command again.

RMAN> catalog start with 'g:\T';
searching for all files that match the pattern g:\T

List of Files Unknown to the Database
=====================================
File Name: G:\TARC_883008933_S28127_P1

Do you really want to catalog the above files (enter YES or NO)? yes

cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: G:\TARC_883008933_S28127_P1


RMAN> list backuppiece 'G:\TARC_883008933_S28127_P1';

List of Backup Pieces
BP Key  BS Key  Pc# Cp# Status      Device Type Piece Name
------- ------- --- --- ----------- ----------- ----------
13633   12924   1   2   AVAILABLE   DISK        G:\TARC_883008933_S28127_P1


RMAN> list backupset 12924;

List of Backup Sets                                                            
===================
BS Key  Size
------- ----------
12924   2.48G

List of Archived Logs in backup set 12924

Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
---- ------- ---------- ------------------- ---------- ---------
2    68379   2830068124 21.06.2015 09:54:37 2830104357 21.06.2015 10:02:27
2    68380   2830104357 21.06.2015 10:02:27 2830156671 21.06.2015 10:10:46
2    68381   2830156671 21.06.2015 10:10:46 2830187258 21.06.2015 10:17:36
2    68382   2830187258 21.06.2015 10:17:36 2830213445 21.06.2015 10:22:01
2    68383   2830213445 21.06.2015 10:22:01 2830239509 21.06.2015 10:27:06
2    68384   2830239509 21.06.2015 10:27:06 2830257850 21.06.2015 10:30:47
2    68385   2830257850 21.06.2015 10:30:47 2830273799 21.06.2015 10:34:01
2    68386   2830273799 21.06.2015 10:34:01 2830306271 21.06.2015 10:41:34
2    68387   2830306271 21.06.2015 10:41:34 2830325027 21.06.2015 10:45:18
2    68388   2830325027 21.06.2015 10:45:18 2830352735 21.06.2015 10:51:05
2    68389   2830352735 21.06.2015 10:51:05 2830383695 21.06.2015 10:58:41
2    68390   2830383695 21.06.2015 10:58:41 2830414559 21.06.2015 11:04:32
2    68391   2830414559 21.06.2015 11:04:32 2830446480 21.06.2015 11:12:14
2    68392   2830446480 21.06.2015 11:12:14 2830471834 21.06.2015 11:18:15
2    68393   2830471834 21.06.2015 11:18:15 2830486546 21.06.2015 11:22:14
2    68394   2830486546 21.06.2015 11:22:14 2830509633 21.06.2015 11:28:05
2    68395   2830509633 21.06.2015 11:28:05 2830530886 21.06.2015 11:32:21
2    68396   2830530886 21.06.2015 11:32:21 2830576051 21.06.2015 11:39:08
2    68397   2830576051 21.06.2015 11:39:08 2830604761 21.06.2015 11:46:28
2    68398   2830604761 21.06.2015 11:46:28 2830623622 21.06.2015 11:51:07
2    68399   2830623622 21.06.2015 11:51:07 2830647907 21.06.2015 11:57:49
2    68400   2830647907 21.06.2015 11:57:49 2830669866 21.06.2015 12:01:42
2    68401   2830669866 21.06.2015 12:01:42 2830701975 21.06.2015 12:10:05
2    68402   2830701975 21.06.2015 12:10:05 2830717193 21.06.2015 12:14:11
2    68403   2830717193 21.06.2015 12:14:11 2830753093 21.06.2015 12:26:11
2    68404   2830753093 21.06.2015 12:26:11 2830791736 21.06.2015 12:34:43
2    68405   2830791736 21.06.2015 12:34:43 2830819126 21.06.2015 12:41:23
2    68406   2830819126 21.06.2015 12:41:23 2830842054 21.06.2015 12:46:41
2    68407   2830842054 21.06.2015 12:46:41 2830862566 21.06.2015 12:51:31
2    68408   2830862566 21.06.2015 12:51:31 2830880088 21.06.2015 12:55:39
2    68409   2830880088 21.06.2015 12:55:39 2830906001 21.06.2015 12:59:57
2    68410   2830906001 21.06.2015 12:59:57 2830925092 21.06.2015 13:03:40
2    68411   2830925092 21.06.2015 13:03:40 2830947332 21.06.2015 13:08:58
2    68412   2830947332 21.06.2015 13:08:58 2830969688 21.06.2015 13:15:03
2    68413   2830969688 21.06.2015 13:15:03 2830996091 21.06.2015 13:22:02
2    68414   2830996091 21.06.2015 13:22:02 2831029186 21.06.2015 13:28:27
2    68415   2831029186 21.06.2015 13:28:27 2831043107 21.06.2015 13:31:51
2    68416   2831043107 21.06.2015 13:31:51 2831066514 21.06.2015 13:36:33
2    68417   2831066514 21.06.2015 13:36:33 2831099209 21.06.2015 13:46:16
2    68418   2831099209 21.06.2015 13:46:16 2831127405 21.06.2015 13:57:43
2    68419   2831127405 21.06.2015 13:57:43 2831163498 21.06.2015 14:09:31
2    68420   2831163498 21.06.2015 14:09:31 2831200968 21.06.2015 14:25:49
2    68421   2831200968 21.06.2015 14:25:49 2831239025 21.06.2015 14:43:08
2    68422   2831239025 21.06.2015 14:43:08 2831287068 21.06.2015 15:02:15
2    68423   2831287068 21.06.2015 15:02:15 2831321905 21.06.2015 15:16:43
2    68424   2831321905 21.06.2015 15:16:43 2831376687 21.06.2015 15:38:14
2    68425   2831376687 21.06.2015 15:38:14 2831420993 21.06.2015 15:58:56
2    68426   2831420993 21.06.2015 15:58:56 2831446376 21.06.2015 16:10:36
2    68427   2831446376 21.06.2015 16:10:36 2831472168 21.06.2015 16:21:27
2    68428   2831472168 21.06.2015 16:21:27 2831499369 21.06.2015 16:31:42
2    68429   2831499369 21.06.2015 16:31:42 2831541993 21.06.2015 16:46:16
2    68430   2831541993 21.06.2015 16:46:16 2831578975 21.06.2015 17:00:32
2    68431   2831578975 21.06.2015 17:00:32 2831615358 21.06.2015 17:13:55
2    68432   2831615358 21.06.2015 17:13:55 2831645246 21.06.2015 17:28:32
2    68433   2831645246 21.06.2015 17:28:32 2831677592 21.06.2015 17:54:21
2    68434   2831677592 21.06.2015 17:54:21 2831728008 21.06.2015 18:32:20
2    68435   2831728008 21.06.2015 18:32:20 2831764928 21.06.2015 19:35:43
2    68436   2831764928 21.06.2015 19:35:43 2831803761 21.06.2015 21:06:29
2    68437   2831803761 21.06.2015 21:06:29 2831838675 21.06.2015 22:02:41
2    68438   2831838675 21.06.2015 22:02:41 2831883176 22.06.2015 00:00:45
2    68439   2831883176 22.06.2015 00:00:45 2831888127 22.06.2015 00:11:16

Backup Set Copy #1 of backup set 12924
Device Type Elapsed Time Completion Time     Compressed Tag
----------- ------------ ------------------- ---------- ---
DISK        00:01:49     22.06.2015 00:17:22 NO         ARC

List of Backup Pieces for backup set 12924 Copy #1

BP Key  Pc# Status      Piece Name
------- --- ----------- ----------
13625   1   AVAILABLE   G:\ARC_883008933_S28127_P1

Backup Set Copy #2 of backup set 12924

Device Type Elapsed Time Completion Time     Compressed Tag
----------- ------------ ------------------- ---------- ---
DISK        00:01:49     26.06.2015 10:42:47 NO         ARC

List of Backup Pieces for backup set 12924 Copy #2

BP Key  Pc# Status      Piece Name
------- --- ----------- ----------
13633   1   AVAILABLE   G:\TARC_883008933_S28127_P1

See, now I have two copies of my backup set, one containing a number of archivelogs and the other one still containing nothing. Strange behavior and I have no clue why the information was missing in the first place. But nevertheless I was now able to recover the database, still incomplete but at least consistent, and make the customer happy. Weekend saved.

Flashback – Recover without Recovery

Introduction

You may know the situation, your databases are running fine, the backups are done regularly and without any issues. But there is one fact that brakes the whole thing, other humans called “users”. And a special kind of “users”, the “developers”. Sometimes they just do things by accident which lead to incorrect updates or deletes. The administrators take care of the physical structure of a database. But that does not protect us from human errors leading to data inconsistencies. That means in turn that recovering from these human errors is time-intensive. Ok, there are enhancements in 12c like the RMAN “recover table” command, but that is a different topic.

Flashback – An Overview

Oracle uses “FlashbacK” for a couple of techniques (SE = Standard Edition, EE = Enterprise Edition):

  • SE: Flashback Query – using UNDO data
  • EE: Flashback Version Query – using UNDO data
  • EE: Flashback Transaction – using Redo data (Logminer)
  • SE: Flashback Drop – using Recyclebin
  • EE: Flashback Database – using Flashback Logs
  • SE: Flashback Data Archive – using special data archives

The interresting thing is, that some powerful features come for free even with the Standard Edition of the database. Even Flashback Data Archive (without history table optimization) is included in all editions starting with 11.2.0.4.

Preparations

In order to use all the UNDO related features we need to prepare some things. With the default configuration the datafiles of UNDO tablespaces are created with “autoexend on” and the UNDO_RETENTION parameter is set to 900 seconds. That means Oracle will keep only these 15 minutes and overwrite older commited data even if there might be enough space available. But if we configure the UNDO datafiles to have a fixed size, Oracle ignores the value of UNDO_RETENTION and starts using all the space that is available and keeps commited transactions as long as there is space left in UNDO. That is visible in V$UNDOSTAT:

SQL> select * from (
  2  select TUNED_UNDORETENTION
  3  from   V$UNDOSTAT
  4  order by END_TIME desc
  5  )
  6  where ROWNUM = 1;

TUNED_UNDORETENTION
-------------------
              36356

Flashback Query

This feature utilizes the read consistency that Oracle provides. It just does not use the current SCN, it uses a SCN from the past. You simply use the “select … from … as of …” syntax the query data as it was at the given point in time.

SQL> create table emp_old
  2  as 
  3  select * 
  4  from emp as of timestamp 
  5  to_timestamp(‘2015-06-14 11:00:00’, 
  6  ‘yyyy-mm-yy hh24:mi:ss’);

If this ends up with an “ORA-1555: snapshot too old”, then it the undo data has already been overwritten. But what can we do with this data? We should materialize it so that we no longer rely on UNDO.

SQL> create table emp_old
  2  as 
  3  select * 
  4  from emp as of timestamp 
  5  to_timestamp(‘2015-06-14 11:00:00’, 
  6  ‘yyyy-mm-yy hh24:mi:ss’);

Now we check for differences:

SQL> select * from emp_old
  2  minus
  3  select * from emp;


Oracle allows us to use the "as of"-clause everywhere and mix it with current data, so we could also use it like this:


SQL> select * from emp as of timestamp
  2                to_timestamp('2015-06-14 11:00:00',
  3                'yyyy-mm-dd hh24:mi:ss')
  4  minus
  5  select * from emp;


But again, this only works as long as the UNDO information is not being overwritten.

Now, assume that some records were deleted by accident. We can re-insert these deleted records easily:


SQL> insert into emp
  2  select * from emp as of timestamp
  3                to_timestamp('2015-06-14 11:00:00',
  4                'yyyy-mm-dd hh24:mi:ss')
  5  minus
  6  select * from emp;

SQL> commit;

Or, if some records were modified unintentionally, we can revert these changes:

SQL> update emp e_live
  2  set sal = (select sal
  3             from emp as of timestamp
  4                      to_timestamp('2015-06-14 11:00:00',
  5                      'yyyy-mm-dd hh24:mi:ss') e_orig
  6             where e_orig.empno = e_live.empno
  7            )
  8  ;

SQL> commit;

These are only simple examples to outline the possibilities provided by flashback query. You may use all flavors of SQL DML to cope with your issue.

Flashback Export

A special use case for Flashback Query is Flashback Export. The legacy "exp" as well as the Datapump Export utility can create dumps for a given point in time. That is most useful if there are dependecies between tables that cannot be handled easily. We create an export representing the data from the past and import the data into another database or schema in the same database and give the user/developer access to it.

$ expdp system/Oracle-1 dumpfile=scott_flashback.dmpdp  \ 
        logfile=scott_flashback.expdp.log \
        directory=data_pump_dir \
        flashback_time='2015-06-14-11:00:00' schemas=scott
$ exp system/Oracle-1 file=/tmp/scott_flashback.dmp \
      log=/tmp/scott_flashback.exp.log \
      flashback_time='2015-06-14-11:00:00' owner=scott


<h3>Flashback Drop</h3>
What can we do if a table got dropped by accident? It is time for the Recyclebin. 


SQL> show recyclebin
ORIGINAL NAME RECYCLEBIN NAME                OBJ. TYPE  DROP TIME
------------- ------------------------------ ---------- -------------------
EMP           BIN$yxLVQQOqC6rgQBAKtBUKlA==$0 TABLE      2015-06-14:13:29:19

Were indexes defined on that table? To see that we need to query the *_recyclebin views:

SQL> select ORIGINAL_NAME, OBJECT_NAME, OPERATION, DROPTIME 
  2  from USER_RECYCLEBIN;

ORIGINAL_NAME OBJECT_NAME                    OPERATION DROPTIME
------------- ------------------------------ --------- -------------------
EMP           BIN$yxLVQQOqC6rgQBAKtBUKlA==$0 DROP      2015-06-14:13:29:19
PK_EMP        BIN$yxLVQQOpC6rgQBAKtBUKlA==$0 DROP      2015-06-14:13:29:19
IX_ENAME      BIN$yxLVQQOoC6rgQBAKtBUKlA==$0 DROP      2015-06-14:13:29:19
IX_MGR        BIN$yxLVQQOnC6rgQBAKtBUKlA==$0 DROP      2015-06-14:13:29:19


Let's restore the table:


SQL> flashback table emp to before drop;

Flashback done.

SQL> select ORIGINAL_NAME, OBJECT_NAME, OPERATION, DROPTIME 
  2  from USER_RECYCLEBIN;

No rows selected.

Ok, no more indexes in the recyclebin. Let's crosscheck that:

SQL> select index_name from user_indexes
  2  where table_name='EMP';

INDEX_NAME
------------------------------
BIN$yxLVQQOoC6rgQBAKtBUKlA==$0
BIN$yxLVQQOnC6rgQBAKtBUKlA==$0
BIN$yxLVQQOpC6rgQBAKtBUKlA==$0

The index names were not restored to the original ones. We need to rename the indexes on our own, good that we queried the names before:

SQL> alter index "BIN$yxLVQQOoC6rgQBAKtBUKlA==$0" rename to IX_ENAME;

Note that we use double-quotes for the recyclebin-name since it has some special characters and mixed upper/lowercase in it.

And also note, that foreign key constraints are not restored. We need to re-create them manually.

Flashback Version Query

This is an Enterprise Edition feature, so be sure to be licensed properly before you use it. Flashback Version Query provides some pseudo columns and a special from-clause to show the evolution of rows over time. Let's see what happened to Mr. Millers salary:

SQL> select
  2    versions_starttime, versions_endtime,
  3    versions_operation, versions_xid,
  4    sal
  5  from
  6    emp versions between timestamp
  7        systimestamp - interval '1' hour and
  8        systimestamp
  9  where
 10    ename = 'MILLER';

VERSIONS_STARTTIME VERSIONS_ENDTIME   V VERSIONS_XID           SAL
------------------ ------------------ - ---------------- ---------
14.06.15 14:35:41                     U 07000B00F7480000      1800
14.06.15 14:29:29  14.06.15 14:35:41  U                       1300
                   14.06.15 14:29:29                       1504,91

We see two updates of the row and the timestamps when to change took place. Insert and Deletes can also be queried using this technique as long as you know what to look for:

SQL> select
  2    versions_starttime, versions_endtime,
  3    versions_operation,
  4    sal
  5  from
  6    emp versions between timestamp
  7        systimestamp - interval '1' hour and
  8        systimestamp
  9  where
 10    ename = 'FOO';

VERSIONS_STARTTIME VERSIONS_ENDTIME   V        SAL
------------------ ------------------ - ----------
14.06.15 15:30:05                     D       5046
14.06.15 14:57:08  14.06.15 15:30:05  U       5046
14.06.15 14:45:05  14.06.15 14:57:08  I       4711

We see the insert followed by an update and at the end the delete. Nice.

Conclusion

The flashback features offer a nice and easy way to recover from human errors causing unintended changes to the data. My advice is to keep UNDO as long as possible, sometime you will appreciate it. Your users or developers do not ask for help immediately. They try on their own to recover the data and only if they are not successful they come and ask. That may be hours or even days after the change happened....