Thursday, July 19, 2012

Perform Oracle Block Recovery without having valid backup

My DBA friend Dejan from Baze Podataka database related site wrote post about Oracle Bug 8943287 which reminded me how dangerous bug this is. I was completely unaware about this bug till few months ago.

For this blog post I want to write about “ORA-01578: ORACLE data block corrupted” error and recovery. It is easy to recover from such error if you have RMAN backup - you, of course, have RMAN backup? (if not, create one) Even if you don’t have RMAN backup you can perform block recovery from OS based backups (datafile copies).

But can you perform block recovery without RMAN backups or OS backups?
You can, but only if you have necessary archivelogs available.



OS: OEL5
DB: Oracle 11.1.0.7



Create small tablespace and table for testing.
SQL> create tablespace corrupt_tbs datafile '/u01/app/oradata/aocdb/corrupt_tbs.dbf' size 10M;
Tablespace created.

SQL> create table test_corrupt (id number, text varchar2(100)) tablespace corrupt_tbs;
Table created.

SQL> insert into test_corrupt values(5,'Marko');
1 row created.

SQL> commit;
Commit complete.

SQL> select id,text from test_corrupt;

        ID TEXT
---------- ----------
         5 Marko


Now to simulate data block corruption. I will generate dd command using this small script:
set heading off
set lines 113
SELECT 'dd of=' || f.file_name || ' bs=8192 conv=notrunc seek=' ||
       to_number(S.HEADER_BLOCK + 1) || ' << EOF',
       'CORRUPT corrupt CORRUPT corrupt CORRUPT corrupt CORRUPT corrupt CORRUPT CORRUPT CORRUPT CORRUPT corrupt corrupt ',
       'EOF'
  FROM DBA_SEGMENTS S, dba_data_files f
 WHERE f.tablespace_name = 'CORRUPT_TBS'
   and S.SEGMENT_NAME = 'TEST_CORRUPT';
   
dd of=/u01/app/oradata/aocdb/corrupt_tbs.dbf bs=8192 conv=notrunc seek=131 << EOF
CORRUPT corrupt CORRUPT corrupt CORRUPT corrupt CORRUPT corrupt CORRUPT CORRUPT CORRUPT CORRUPT corrupt corrupt
EOF

Check block corruption after executing dd command.
$ dd of=/u01/app/oradata/aocdb/corrupt_tbs.dbf bs=8192 conv=notrunc seek=131 << EOF
> CORRUPT corrupt CORRUPT corrupt CORRUPT corrupt CORRUPT corrupt CORRUPT CORRUPT CORRUPT CORRUPT corrupt corrupt
> EOF
0+1 records in
0+1 records out
112 bytes (112 B) copied, 7.9169e-05 seconds, 1.4 MB/s

SQL> alter system flush buffer_cache;

System altered.

SQL> select id,text from test_corrupt;
select id,text from test_corrupt
                    *
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 8, block # 131)
ORA-01110: data file 8: '/u01/app/oradata/aocdb/corrupt_tbs.dbf'


Block is successfully corrupted - now to recover corrupted block.
As I don’t have RMAN backup or valid copy of datafile I will create copy of corrupted datafile to temporary location using RMAN restore command.
Before performing RMAN restore you should take corrupted datafile offline or you will get error:
Starting restore at 19.07.2012 21:56:13
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=139 device type=DISK

creating datafile file number=8 name=/tmp/corrupt_tby_copy.dbf
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 07/19/2012 21:56:14
ORA-01182: cannot create database file 8 - file is in use or recovery
ORA-01110: data file 8: '/u01/app/oradata/aocdb/corrupt_tbs.dbf'


Restore datafile 8 to temporary location “/tmp/corrupt_tbs_copy.dbf”.
RMAN> sql 'alter database datafile 8 offline';

using target database control file instead of recovery catalog
sql statement: alter database datafile 8 offline

RMAN> run
2> {
3> set newname for datafile 8 to '/tmp/corrupt_tbs_copy.dbf';
4> restore datafile 8;
5> }

executing command: SET NEWNAME

Starting restore at 19.07.2012 21:27:49
using channel ORA_DISK_1

creating datafile file number=8 name=/tmp/corrupt_tbs_copy.dbf
restore not done; all files read only, offline, or already restored
Finished restore at 19.07.2012 21:27:49


This new file acts as RMAN copy of the corrupted datafile which will be used for RMAN block recovery.
RMAN> list copy of datafile 8;

List of Datafile Copies
=======================

Key     File S Completion Time     Ckp SCN    Ckp Time
------- ---- - ------------------- ---------- -------------------
103     8    A 19.07.2012 21:27:49 12672352727932 19.07.2012 21:11:50
        Name: /tmp/corrupt_tby_copy.dbf


Now to perform RMAN block recovery.
RMAN> blockrecover datafile 8 block 131;

Starting recover at 19.07.2012 21:29:46
using channel ORA_DISK_1

channel ORA_DISK_1: restoring block(s) from datafile copy /tmp/corrupt_tbs_copy.dbf

starting media recovery
media recovery complete, elapsed time: 00:00:01

Finished recover at 19.07.2012 21:29:48

From alert log:
Thu Jul 19 21:29:47 2012
alter database recover datafile list clear
Completed: alter database recover datafile list clear
Starting block media recovery
Recovery of Online Redo Log: Thread 1 Group 1 Seq 481 Reading mem 0
  Mem# 0: /u01/app/oradata/aocdb/redo01.log
Recovery of Online Redo Log: Thread 1 Group 2 Seq 482 Reading mem 0
  Mem# 0: /u01/app/oradata/aocdb/redo02.log
Completed block media recovery

As you can see RMAN used newly created file to copy “good” block and performed block recovery using information from redo logs.

For the last step we must recover datafile 8 to be consisted. Remember, datafile 8 was made offline before performing RMAN restore command.
RMAN> recover datafile 8;

Starting recover at 19.07.2012 21:32:10
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:00

Finished recover at 19.07.2012 21:32:11


RMAN> sql 'alter database datafile 8 online';

sql statement: alter database datafile 8 online


We successfully recovered corrupted block:
SQL> select id,text from test_corrupt;

        ID TEXT
---------- ----------
         5 Marko



Cleanup.
RMAN> delete copy of datafile 8;

released channel: ORA_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=131 device type=DISK
List of Datafile Copies
=======================

Key     File S Completion Time     Ckp SCN    Ckp Time
------- ---- - ------------------- ---------- -------------------
103     8    A 19.07.2012 21:27:49 12672352727932 19.07.2012 21:11:50
        Name: /tmp/corrupt_tbs_copy.dbf


Do you really want to delete the above objects (enter YES or NO)? yes
deleted datafile copy
datafile copy file name=/tmp/corrupt_tby_copy.dbf RECID=103 STAMP=789082069
Deleted 1 objects

SQL> drop tablespace corrupt_tbs including contents and datafiles;

Tablespace dropped.

2 comments:

  1. 1) What was in that block before you corrupt it? I think nothing-zeroes.
    2) If 1) is true then tbs file header was able to recover because in it catalog was declaration of zeroes-and he was not polluted. In a case that there was some content - there is no way to recover bad block!

    P.S.
    Try to recover bad block which came because create global temporary table with no "force logging" and then recover it. There is no way to retrieve that data because create object (table) is not in archive log file.
    Rg
    Damir

    ReplyDelete
  2. Hello Damir,

    it was newly created tablespace, so yes all zeroes. I should emphasize in blog post that this recovery is only valid for newly created tablespaces/datafiles.

    This is one of the rare cases when you can help yourself with corrupted block problem. It is not general guideline how to perform block recovery.

    About example with global temporary tables, I agree with you that block is not recoverable. But I will perform test just to check solution.

    Thanks for comment,
    Marko

    ReplyDelete