处理smon清理临时段导致数据库异常案例
一个朋友的数据库在经过自己的千辛万苦终于open成功,但是几分钟就down掉,使得他想导出数据重建库的目标不能实现.让我帮忙处理
alert日志报ORA-00600[kafspa:columnBuffer1]
Wed Aug 8 10:55:31 2012 Completed: ALTER DATABASE OPEN Wed Aug 8 10:55:41 2012 Errors in file /oracle/ora10/admin/ora10g/udump/ora10g_ora_12160.trc: ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], [] Wed Aug 8 10:55:47 2012 Non-fatal internal error happenned while SMON was doing temporary segment drop. SMON encountered 2 out of maximum 100 non-fatal internal errors. Wed Aug 8 10:55:47 2012 Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc: ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], [] Wed Aug 8 10:55:58 2012 Non-fatal internal error happenned while SMON was doing temporary segment drop. SMON encountered 3 out of maximum 100 non-fatal internal errors. Wed Aug 8 10:55:59 2012 Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc: ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], [] Wed Aug 8 10:56:10 2012 Non-fatal internal error happenned while SMON was doing temporary segment drop. SMON encountered 4 out of maximum 100 non-fatal internal errors. Wed Aug 8 10:56:11 2012 Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc: ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], [] Wed Aug 8 10:56:22 2012 Non-fatal internal error happenned while SMON was doing temporary segment drop. SMON encountered 5 out of maximum 100 non-fatal internal errors. Wed Aug 8 10:56:32 2012 Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc: ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], [] Wed Aug 8 10:56:43 2012 Non-fatal internal error happenned while SMON was doing temporary segment drop. SMON encountered 6 out of maximum 100 non-fatal internal errors. Wed Aug 8 10:56:53 2012 Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc: ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], [] Wed Aug 8 10:57:04 2012 Non-fatal internal error happenned while SMON was doing temporary segment drop. SMON encountered 7 out of maximum 100 non-fatal internal errors. Wed Aug 8 10:57:14 2012 Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc: ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], [] Wed Aug 8 10:57:25 2012 Non-fatal internal error happenned while SMON was doing temporary segment drop. SMON encountered 8 out of maximum 100 non-fatal internal errors. Wed Aug 8 10:57:35 2012 Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc: ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], [] Wed Aug 8 10:57:38 2012 Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_pmon_8856.trc: ORA-00474: SMON process terminated with error Wed Aug 8 10:57:38 2012 PMON: terminating instance due to error 474 Instance terminated by PMON, pid = 8856
这里出现ORA-00600[kafspa:columnBuffer1],一个未知的错误,但是根据相关的提示,可以大概猜出来是什么原因导致数据库异常
Non-fatal internal error happenned while SMON was doing temporary segment drop.
出现这个错误,使得我们想到一个smon的功能,清理临时段.该数据库down掉很可能和smon清理临时段的过程发生失败有关系
SMON encountered 8 out of maximum 100 non-fatal internal errors.
这个错误提示是因为smon内部最多允许发生100次错误,记录错误发生了8次,当然这次数据库down掉是smon还没有达到100次就直接abort掉
SQL> col name for a32 SQL> col value for a24 SQL> col description for a70 SQL> set linesize 150 SQL> select a.ksppinm name,b.ksppstvl value,a.ksppdesc description 2 from x$ksppi a,x$ksppcv b 3 where a.inst_id = USERENV (\'Instance\') 4 and b.inst_id = USERENV (\'Instance\') 5 and a.indx = b.indx 6 and upper(a.ksppinm) LIKE upper(\'%¶m%\') 7 order by name 8 SQL> / Enter value for param: smon_internal_errlimit old 6: and upper(a.ksppinm) LIKE upper(\'%¶m%\') new 6: and upper(a.ksppinm) LIKE upper(\'%smon_internal_errlimit%\') NAME VALUE DESCRIPTION -------------------------------- ------------------------ ----------------------------------- _smon_internal_errlimit 100 limit of SMON internal errors
分析trace文件
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production With the Partitioning, OLAP, Data Mining and Real Application Testing options ORACLE_HOME = /oracle/ora10/product System name: Linux Node name: DBN-HLD-155-2 Release: 2.6.18-92.el5PAE Version: #1 SMP Tue Apr 29 13:31:02 EDT 2008 Machine: i686 Instance name: ora10g Redo thread mounted by this instance: 1 Oracle process number: 8 Unix process pid: 8941, image: oracle@DBN-HLD-155-2 (SMON) *** SERVICE NAME:() 2012-08-08 10:55:20.208 *** SESSION ID:(274.1) 2012-08-08 10:55:20.208 *** 2012-08-08 10:55:20.208 ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [], [] Current SQL statement for this session: select file#, block#, ts# from seg$ where type# = 3
果然是smon在查询type#=3的时候发现异常,出现ORA-00600[25027]错误.通过对seg$相关视图分析,可以知道type#=3表示临时段,也就是说数据库smon在查询哪些segment是临时段的时候发生意外,而结合alert日志,完整的错误应该就是:数据库启动后,smon进程为了清理临时段,需要通过select file#, block#, ts# from seg$ where type# = 3查询临时段,但是在查询的时候发生错误.而这个错误累积几次导致数据库异常关闭.
解决问题
--修改两个参数 event=\'10061 trace name context forever, level 10\' _smon_internal_errlimit=1000000 --启动数据库 SQL> startup pfile=\'/tmp/pfile\' ORACLE instance started. Total System Global Area 1577058304 bytes Fixed Size 1267716 bytes Variable Size 385878012 bytes Database Buffers 1174405120 bytes Redo Buffers 15507456 bytes Database mounted. Database opened.
因为屏蔽了smon回收临时段,数据库未出现开始时错误,观察数据库几个小时,运行正常,到此基本上解决了此次异常,通过exp可以顺利导出数据然后导入到新库中.
导致该异常sql分析
SQL> select file#, block#, ts# from seg$ where type# = 3; select file#, block#, ts# from seg$ where type# = 3 * ERROR at line 1: ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [], [] SQL> set autot trace exp SQL> select file#, block#, ts# from seg$ where type# = 3; Execution Plan ---------------------------------------------------------- Plan hash value: 1605285479 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 150 | 389 (1)| 00:00:05 | |* 1 | TABLE ACCESS FULL| SEG$ | 10 | 150 | 389 (1)| 00:00:05 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("TYPE#"=3) SQL> select count(rowid) from seg$; Execution Plan ---------------------------------------------------------- Plan hash value: 763549841 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 12 | 389 (1)| 00:00:05 | | 1 | SORT AGGREGATE | | 1 | 12 | | | | 2 | TABLE ACCESS FULL| SEG$ | 45682 | 535K| 389 (1)| 00:00:05 | --------------------------------------------------------------------------- SQL> select /*+ full(t) */ count(*) from seg$; Execution Plan ---------------------------------------------------------- Plan hash value: 763549841 ------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| Time | ------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 29 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | | | | 2 | TABLE ACCESS FULL| SEG$ | 3223 | 29 (0)| 00:00:01 | ------------------------------------------------------------------- SQL> select ts# from seg$; Execution Plan ---------------------------------------------------------- Plan hash value: 1605285479 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3223 | 9669 | 29 (0)| 00:00:01 | | 1 | TABLE ACCESS FULL| SEG$ | 3223 | 9669 | 29 (0)| 00:00:01 | -------------------------------------------------------------------------- SQL> set autot off SQL> select count(rowid) from seg$; SQL> / COUNT(ROWID) ------------ 45727 SQL> select /*+ full(t) */ count(*) from seg$; COUNT(*) ---------- 45727 SQL> select ts# from seg$; 0 0 …… 7 7 ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [], [] SQL> !dbv file=\'/oracle/ora10/oradata/ora10g/system01.dbf\' DBVERIFY: Release 10.2.0.4.0 - Production on Thu Aug 9 14:05:09 2012 Copyright (c) 1982, 2007, Oracle. All rights reserved. DBVERIFY - Verification starting : FILE = /oracle/ora10/oradata/ora10g/system01.dbf Block Checking: DBA = 4225673, Block Type = KTB-managed data block data header at 0xb6d8225c kdbchk: bad row offset slot 6 offs 3030 fseo 3752 dtl 8168 bhs 72 Page 31369 failed with check code 6135 Block Checking: DBA = 4236289, Block Type = KTB-managed data block data header at 0xb6d7225c kdbchk: tosp bad (-13399) Page 41985 failed with check code 6127 DBVERIFY - Verification complete Total Pages Examined : 192000 Total Pages Processed (Data) : 47588 Total Pages Failing (Data) : 2 Total Pages Processed (Index): 40929 Total Pages Failing (Index): 0 Total Pages Processed (Other): 1784 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 101699 Total Pages Marked Corrupt : 0 Total Pages Influx : 0 Highest block SCN : 1775671440 (0.1775671440) SQL> ANALYZE TABLE sys.SEG$ VALIDATE STRUCTURE CASCADE; ANALYZE TABLE sys.SEG$ VALIDATE STRUCTURE CASCADE * ERROR at line 1: ORA-01498: block check failure - see trace file
通过这里的分析大概可以确定是由于两块KTB-managed data block数据块异常,导致直接对seg$进行TABLE ACCESS FULL操作的时候发生异常.因为这个库已经破坏了数据一致性,先导出来数据,至于出现该错误的原因,后续继续关注分析
建议继续学习:
扫一扫订阅我的微信号:IT技术博客大学习
- 作者:惜分飞 来源: 惜分飞
- 标签: smon
- 发布时间:2012-08-09 23:44:14
- [52] IOS安全–浅谈关于IOS加固的几种方法
- [51] android 开发入门
- [50] 如何拿下简短的域名
- [49] Oracle MTS模式下 进程地址与会话信
- [48] 图书馆的世界纪录
- [47] 【社会化设计】自我(self)部分――欢迎区
- [46] Go Reflect 性能
- [43] 读书笔记-壹百度:百度十年千倍的29条法则
- [37] 视觉调整-设计师 vs. 逻辑
- [34] 程序员技术练级攻略