一个135TB的Oracle DataBase恢复case

本站文章除注明转载外,均为本站原创: 转载自love wife & love life —Roger的Oracle&MySQL技术博客

本文链接地址: 一个135TB的Oracle DataBase恢复case

前几天帮助某用户进行了一次数据库恢复操作;通过检查数据库文件发现发现整个库的文件Size之和超过135TB,其中段大小超过132TB:

SQL> select sum(bytes/1024/1024/1024/1024) "TB" from dba_segments;

SUM(BYTES/1024/1024/1024/1024) "TB"
------------------------------
                    132.866246

首先我们来看看数据库的恢复情况,考虑到客户信息涉密文件,这里我仅做简单的描述:

Wed May 16 15:34:22 2018
Errors in file /oracle/admin/yddb/udump/yddb1_ora_3081048.trc:
ORA-00600: internal error code, arguments: [504], [0x70000001000AF08], [1], [0], [slave class create], [0], [0], [0x700000501CE7C08]
Wed May 16 15:34:22 2018
Errors in file /oracle/admin/xxxx/bdump/xxxx1_smon_2818542.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 22479)
ORA-01110: data file 1: '+DISKGROUP1/xxxx/datafile/system.259.738348071'
Wed May 16 15:34:22 2018
Errors in file /oracle/admin/xxxx/bdump/xxxx1_smon_2818542.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 22479)
ORA-01110: data file 1: '+DISKGROUP1/xxxx/datafile/system.259.738348071'
Wed May 16 15:34:23 2018
Errors in file /oracle/admin/xxxx/bdump/xxxx1_smon_2818542.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 22479)
ORA-01110: data file 1: '+DISKGROUP1/xxxx/datafile/system.259.738348071'
Wed May 16 15:34:24 2018
Errors in file /oracle/admin/xxxx/udump/xxxx1_ora_3081048.trc:
ORA-00600: internal error code, arguments: [504], [0x70000001000AF08], [1], [0], [slave class create], [0], [0], [0x700000501CE7C08]
Wed May 16 15:34:24 2018
Errors in file /oracle/admin/xxxx/udump/xxxx1_ora_3081048.trc:
ORA-00600: internal error code, arguments: [4193], [64], [69], [], [], [], [], []
Wed May 16 15:34:24 2018
Trace dumping is performing id=[cdmp_20180516153424]
Wed May 16 15:34:24 2018
Errors in file /oracle/admin/xxxx/bdump/xxxx1_smon_2818542.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 22479)
ORA-01110: data file 1: '+DISKGROUP1/xxxx/datafile/system.259.738348071'
Wed May 16 15:34:25 2018
Errors in file /oracle/admin/xxxx/bdump/xxxx1_smon_2818542.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 22479)
ORA-01110: data file 1: '+DISKGROUP1/xxxx/datafile/system.259.738348071'
Wed May 16 15:34:25 2018
DEBUG: Replaying xcb 0x7000004fc60b848, pmd 0x700000501ce8c48 for failed op 8
Doing block recovery for file 2 block 508535
No block recovery was needed
Wed May 16 15:34:35 2018
Errors in file /oracle/admin/xxxx/bdump/xxxx1_smon_2818542.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 22479)
ORA-01110: data file 1: '+DISKGROUP1/xxxx/datafile/system.259.738348071'

这里可以发现Oracle SMON进程在进行事务恢复时报错,同时alert抛出了ORA-00600 [4193]错误;而该错误则与Undo表空间存在联系。
由此可见Undo表空间也存在异常。对于system数据字典的file #1 block 22479号block。这里我们首先将其从ASM磁盘组copy到文件系统,然后通过提前编译好的Oracle内置block edit tools进行修改,将该block中涉及到的未提交事务强制进行提交;通过人为沟通已提交事务的方式,来绕过Oracle对该Block涉及的事务进行回滚(该数据块涉及的对象为obj#=8,是一个Cluster block,修改相对复杂一些)。

修改完该block之后,再将该block copy到asm中,再次启动数据库,发现能够顺利open数据库。

其次由于undo表空间也存在异常,因此这里需要将undo表空间进行重建,如下是重建的步骤:

1) 创建新的undo表空间
 
   create undo tablespace undotbs11 datafile '+DISKGROUP1' size 4096m;
   
2) 修改参数

   alter systen set undo_tablespace=undotbs11 scope=spfile sid='xxxx1';
   
3) 删除损坏的undo表空间

   drop tablespace  undotbs1 including contents and datafiles;

重建完undo之后,接下来就是处理坏块相关内容,300多个坏块这里分别进行了排查,基本上集中在某一天的数据上,其中还有个别是Index。处理相对简单一些,这里不再重复!

整个恢复Case并不复杂,这里不再重复,相关ORA-00600错误的处理方法在blog中均可搜索到。

关注dbDao.com的新浪微博

扫码加入微信Oracle小密圈,了解Oracle最新技术下载分享资源

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569