Thursday, November 4, 2010

Recover From Incrementally Updated Backup (test case)

In this test I will play with incrementally updated backup. Using this feature we can recover datafile image copies applying incremental backups to them.

NOTE!
You should use 'flash recovery area' as primary location for your backups whenever you can. To complicate this test case a little I won’t use 'flash recovery area' for my backups.

Oracle 10.2.0.4.0
Linux OEL5

For taking backup I will use this RMAN script:

connect target /
run
{
sql 'alter system checkpoint';
allocate channel c1 device type disk format '/home/ora10g/backup/%U.dbf';
backup as compressed backupset
check logical incremental level 1
for recover of copy with tag DAILY_FULL database
filesperset 1;

recover copy of database with tag DAILY_FULL;
release channel c1;

allocate channel c1 device type disk;
backup (archivelog all format '/home/ora10g/backup/ARCH_%d_%T_%u_s%s_p%p' delete input TAG "archivelogs");
backup current controlfile format '/home/ora10g/backup/controlf_%d_%u_%s_%T';
release channel c1;
}
(I've borrowed first part of the script from great DBA Husnu Sensoy - Thanks!)

After first execution of this script RMAN will create initial image copy of the database, backup archived redo logs and current controlfile to directory ‘/home/ora10g/backup’.

Now I will create small table just for later checks and run script above again.

$ sqlplus admin

SQL> create table test123 as select * from dba_objects;
Table created.

SQL> alter system switch logfile;
System altered.

Next time RMAN will find image copies of data files and make just incremental backups. Additional step we will be updating image copies of datafiles applying the incremental backups to them.

RMAN script in usual circumstances works perfectly but this time I will interfere and kill RMAN process during recovery using ‘kill -9 [rman pid]’ command to simulate disaster.

To simulate disk crash I will drop original database using rm command.

So to summarize:
- I’ve lost my original database due to the 'disk crash'
- I have:
Image copies of datafiles partially recovered
Backups of archived redo logs
Multiplexed redo logs
Backup of controlfile


Can I fully recover my original database using this files? I can try... :)

I have already created empty directories where I will put database files, controlfiles, etc.


Restore controlfile and mount database:

RMAN> run
2> {
3> allocate channel c1 device type disk;
4> restore controlfile from '/home/ora10g/backup/controlf_RASAPP_khls5prr_3729_20101104';
5> }

using target database control file instead of recovery catalog
allocated channel: c1
channel c1: sid=157 devtype=DISK

Starting restore at 04-NOV-10

channel c1: restoring control file
channel c1: restore complete, elapsed time: 00:00:02
output filename=/u01/oradata/rasapp/control01.ctl
output filename=/u01/oradata/rasapp/control02.ctl
output filename=/u02/oradata/rasapp/control03.ctl
Finished restore at 04-NOV-10
released channel: c1

RMAN> alter database mount;

database mounted



Catalog backup pieces from backup directory ‘/home/ora10g/backup’:
RMAN> catalog start with '/home/ora10g/backup';
...
...

-- Check copy of database:

RMAN> list copy of database;
...
...


Now I will try to switch database to the latest image copy.

RMAN> switch database to copy;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of switch to copy command at 11/04/2010 15:30:15
RMAN-06571: datafile 9 does not have recoverable copy


As I’ve killed RMAN process during recovery operation my image copy of database is not valid.

How to overcome this error...


I will run ‘RECOVER COPY OF DATABASE WITH TAG ‘DAILY_FULL’’ to apply incremental backups to datafile copies with the same tag.

RMAN> recover copy of database with tag 'DAILY_FULL';

Starting recover at 04-NOV-10
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=157 devtype=DISK
no copy of datafile 9 found to recover
no copy of datafile 12 found to recover
no copy of datafile 13 found to recover
no copy of datafile 17 found to recover
channel ORA_DISK_1: starting incremental datafile backupset restore
channel ORA_DISK_1: specifying datafile copies to recover
recovering datafile copy fno=00018 name=/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-UNDOTBS2_FNO-18_k4ls5pee.dbf
channel ORA_DISK_1: reading from backup piece /home/ora10g/backup/krls5qrn_1_1.dbf
channel ORA_DISK_1: restored backup piece 1
piece handle=/home/ora10g/backup/krls5qrn_1_1.dbf tag=TAG20101104T145544
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
channel ORA_DISK_1: starting incremental datafile backupset restore
channel ORA_DISK_1: specifying datafile copies to recover
recovering datafile copy fno=00008 name=/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-SAP_MERGE_FNO-8_k3ls5p99.dbf
channel ORA_DISK_1: reading from backup piece /home/ora10g/backup/kqls5qrm_1_1.dbf
channel ORA_DISK_1: restored backup piece 1
piece handle=/home/ora10g/backup/kqls5qrm_1_1.dbf tag=TAG20101104T145544
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
channel ORA_DISK_1: starting incremental datafile backupset restore
channel ORA_DISK_1: specifying datafile copies to recover
recovering datafile copy fno=00006 name=/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-BILL_DATA_FNO-6_k8ls5pnh.dbf
channel ORA_DISK_1: reading from backup piece /home/ora10g/backup/kvls5qrs_1_1.dbf
channel ORA_DISK_1: restored backup piece 1
piece handle=/home/ora10g/backup/kvls5qrs_1_1.dbf tag=TAG20101104T145544
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
...
...
Finished recover at 04-NOV-10

Notice this part of the log:
no copy of datafile 9 found to recover
no copy of datafile 12 found to recover
no copy of datafile 13 found to recover
no copy of datafile 17 found to recover

From alertlog:
...
Thu Nov  4 15:26:51 2010
datafilecopy header validation failure for file /home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-LNP_FNO-9_k1ls5orq.dbf
datafilecopy header validation failure for file /home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-LNP_FNO-9_k1ls5orq.dbf
datafilecopy header validation failure for file /home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-PSUB_DATA_FNO-12_jvls5nuo.dbf
datafilecopy header validation failure for file /home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-PSUB_DATA_FNO-12_jvls5nuo.dbf
datafilecopy header validation failure for file /home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-DSLAM_STATS_FNO-13_juls5nco.dbf
datafilecopy header validation failure for file /home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-DSLAM_STATS_FNO-13_juls5nco.dbf
datafilecopy header validation failure for file /home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-TS_DATA_FNO-17_k0ls5od9.dbf
datafilecopy header validation failure for file /home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-TS_DATA_FNO-17_k0ls5od9.dbf
Thu Nov  4 15:27:06 2010
datafilecopy header validation failure for file /home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-LNP_FNO-9_k1ls5orq.dbf
datafilecopy header validation failure for file /home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-PSUB_DATA_FNO-12_jvls5nuo.dbf
datafilecopy header validation failure for file /home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-DSLAM_STATS_FNO-13_juls5nco.dbf
datafilecopy header validation failure for file /home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-TS_DATA_FNO-17_k0ls5od9.dbf
...
...

ORA-19563:
%s header validation failed for file string

Cause: When opening the file to be placed in a copy or backup set, to be inspected, or used as the target for an incremental restore, its header was not recognized as a valid file header for a file of the indicated type (data file, archived log, or control file) belonging to the current database.

Action: Ensure that the correct files are being specified for the copy or backup operation.


When I run ‘switch database to copy’ command it fails again with same error as before:

RMAN> switch database to copy;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of switch to copy command at 11/04/2010 18:29:03
RMAN-06571: datafile 9 does not have recoverable copy


To overcome this error I will uncatalog noted datafiles:

RMAN> change copy of datafile 9 uncatalog;

using target database control file instead of recovery catalog
uncataloged datafile copy
datafile copy filename=/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-LNP_FNO-9_k1ls5orq.dbf recid=95 stamp=734192788
Uncataloged 1 objects


RMAN> change copy of datafile 12 uncatalog;

uncataloged datafile copy
datafile copy filename=/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-PSUB_DATA_FNO-12_jvls5nuo.dbf recid=93 stamp=734192037
Uncataloged 1 objects


RMAN> change copy of datafile 13 uncatalog;

uncataloged datafile copy
datafile copy filename=/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-DSLAM_STATS_FNO-13_juls5nco.dbf recid=92 stamp=734191568
Uncataloged 1 objects


RMAN> change copy of datafile 17 uncatalog;

uncataloged datafile copy
datafile copy filename=/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-TS_DATA_FNO-17_k0ls5od9.dbf recid=94 stamp=734192505
Uncataloged 1 objects


Catalog them again:

RMAN> catalog start with '/home/ora10g/backup';

searching for all files that match the pattern /home/ora10g/backup

List of Files Unknown to the Database
=====================================
File Name: /home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-PSUB_DATA_FNO-12_jvls5nuo.dbf
File Name: /home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-LNP_FNO-9_k1ls5orq.dbf
File Name: /home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-DSLAM_STATS_FNO-13_juls5nco.dbf
File Name: /home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-TS_DATA_FNO-17_k0ls5od9.dbf

Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: /home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-PSUB_DATA_FNO-12_jvls5nuo.dbf
File Name: /home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-LNP_FNO-9_k1ls5orq.dbf
File Name: /home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-DSLAM_STATS_FNO-13_juls5nco.dbf
File Name: /home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-TS_DATA_FNO-17_k0ls5od9.dbf


Let’s try switch database to copy:

RMAN> switch database to copy;

datafile 1 switched to datafile copy "/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-SYSTEM_FNO-1_k6ls5plb.dbf"
datafile 2 switched to datafile copy "/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-SAPBI_DATA_FNO-2_k7ls5pmo.dbf"
datafile 3 switched to datafile copy "/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-SYSAUX_FNO-3_k5ls5piv.dbf"
datafile 4 switched to datafile copy "/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-APP_DATA_FNO-4_k2ls5p4o.dbf"
datafile 5 switched to datafile copy "/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-USERS_FNO-5_kbls5ppj.dbf"
datafile 6 switched to datafile copy "/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-BILL_DATA_FNO-6_k8ls5pnh.dbf"
datafile 7 switched to datafile copy "/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-OT_DATA_FNO-7_kcls5pq2.dbf"
datafile 8 switched to datafile copy "/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-SAP_MERGE_FNO-8_k3ls5p99.dbf"
datafile 9 switched to datafile copy "/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-LNP_FNO-9_k1ls5orq.dbf"
datafile 10 switched to datafile copy "/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-PHONEBOOK_FNO-10_kals5pp4.dbf"
datafile 11 switched to datafile copy "/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-IVR_DATA_FNO-11_kdls5pqh.dbf"
datafile 12 switched to datafile copy "/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-PSUB_DATA_FNO-12_jvls5nuo.dbf"
datafile 13 switched to datafile copy "/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-DSLAM_STATS_FNO-13_juls5nco.dbf"
datafile 14 switched to datafile copy "/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-SIPDIR_DATA_FNO-14_k9ls5pob.dbf"
datafile 16 switched to datafile copy "/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-INV_RN_INTERFACE_FNO-16_kels5pr0.dbf"
datafile 17 switched to datafile copy "/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-TS_DATA_FNO-17_k0ls5od9.dbf"
datafile 18 switched to datafile copy "/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-UNDOTBS2_FNO-18_k4ls5pee.dbf"

It worked...


Recover database now:

RMAN> recover database;

Starting recover at 04-NOV-10
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=157 devtype=DISK

starting media recovery

channel ORA_DISK_1: starting archive log restore to default destination
channel ORA_DISK_1: restoring archive log
archive log thread=1 sequence=10
channel ORA_DISK_1: restoring archive log
archive log thread=1 sequence=11
channel ORA_DISK_1: reading from backup piece /home/ora10g/backup/ARCH_RASAPP_20101104_kjls5q8e_s3731_p1
channel ORA_DISK_1: restored backup piece 1
piece handle=/home/ora10g/backup/ARCH_RASAPP_20101104_kjls5q8e_s3731_p1 tag=ARCHIVELOGS
channel ORA_DISK_1: restore complete, elapsed time: 00:00:02
archive log filename=/u01/oradata/rasapp/archlogs/1_10_734173548.dbf thread=1 sequence=10
archive log filename=/u01/oradata/rasapp/archlogs/1_11_734173548.dbf thread=1 sequence=11
unable to find archive log
archive log thread=1 sequence=12
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 11/04/2010 18:31:26
RMAN-06054: media recovery requesting unknown log: thread 1 seq 12 lowscn 48367782959

I’m missing some redo entries - this is because I forgot to copy multiplexed redo log files to the correct location.

Copy multiplexed redo logs to specified locations:

SQL> select member from v$logfile;

MEMBER
--------------------------------------------------------------------------------
/u01/oradata/rasapp/redo01a.rdo
/u02/oradata/rasapp/redo01b.rdo
/u01/oradata/rasapp/redo02a.rdo
/u02/oradata/rasapp/redo02b.rdo
/u01/oradata/rasapp/redo03a.rdo
/u02/oradata/rasapp/redo03b.rdo

6 rows selected.


$ cp redo01b.rdo /u02/oradata/rasapp/
...
...


Everything should be OK now during recovery:

$ rman target /

Recovery Manager: Release 10.2.0.4.0 - Production on Thu Nov 4 18:50:25 2010

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

connected to target database: RASAPP (DBID=1650089695, not open)

RMAN> recover database;

Starting recover at 04-NOV-10
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=157 devtype=DISK

starting media recovery

archive log thread 1 sequence 12 is already on disk as file /u01/oradata/rasapp/redo03b.rdo
archive log filename=/u02/oradata/rasapp/redo03b.rdo thread=1 sequence=12
media recovery complete, elapsed time: 00:00:06
Finished recover at 04-NOV-10

RMAN> alter database open resetlogs;

database opened


Let’s check data:

SQL> col file_name for a100
SQL> set lines 200
SQL> set pages 999
SQL> select file_name from dba_data_files;

FILE_NAME
----------------------------------------------------------------------------------------------------
/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-SYSTEM_FNO-1_k6ls5plb.dbf
/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-SAPBI_DATA_FNO-2_k7ls5pmo.dbf
/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-SYSAUX_FNO-3_k5ls5piv.dbf
/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-OT_DATA_FNO-7_kcls5pq2.dbf
/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-APP_DATA_FNO-4_k2ls5p4o.dbf
/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-USERS_FNO-5_kbls5ppj.dbf
/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-BILL_DATA_FNO-6_k8ls5pnh.dbf
/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-SAP_MERGE_FNO-8_k3ls5p99.dbf
/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-LNP_FNO-9_k1ls5orq.dbf
/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-PHONEBOOK_FNO-10_kals5pp4.dbf
/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-IVR_DATA_FNO-11_kdls5pqh.dbf
/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-PSUB_DATA_FNO-12_jvls5nuo.dbf
/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-DSLAM_STATS_FNO-13_juls5nco.dbf
/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-SIPDIR_DATA_FNO-14_k9ls5pob.dbf
/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-INV_RN_INTERFACE_FNO-16_kels5pr0.dbf
/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-TS_DATA_FNO-17_k0ls5od9.dbf
/home/ora10g/backup/data_D-RASAPP_I-1650089695_TS-UNDOTBS2_FNO-18_k4ls5pee.dbf

17 rows selected.

SQL> select count(*) from dba_objects where status<>'VALID';

  COUNT(*)
----------
         0

SQL> select count(*) from admin.test123;

  COUNT(*)
----------
     50118


In your own test environment, you can try out many different recovery scenarios without doing any harm. Performing test cases is very important for DBA’s because you must be well prepared when production database encounters disaster ;)

1 comment:

  1. Thanks,
    Very very helpful
    Appreciate your publishing this test case on the web.




    ReplyDelete