视频1 视频21 视频41 视频61 视频文章1 视频文章21 视频文章41 视频文章61 推荐1 推荐3 推荐5 推荐7 推荐9 推荐11 推荐13 推荐15 推荐17 推荐19 推荐21 推荐23 推荐25 推荐27 推荐29 推荐31 推荐33 推荐35 推荐37 推荐39 推荐41 推荐43 推荐45 推荐47 推荐49 关键词1 关键词101 关键词201 关键词301 关键词401 关键词501 关键词601 关键词701 关键词801 关键词901 关键词1001 关键词1101 关键词1201 关键词1301 关键词1401 关键词1501 关键词1601 关键词1701 关键词1801 关键词1901 视频扩展1 视频扩展6 视频扩展11 视频扩展16 文章1 文章201 文章401 文章601 文章801 文章1001 资讯1 资讯501 资讯1001 资讯1501 标签1 标签501 标签1001 关键词1 关键词501 关键词1001 关键词1501 专题2001
OracleORA-00600[157]解决方法
2020-11-09 11:56:41 责编:小采
文档


DB 环境是:AIX 6.1 + ORACLE 10.2.0.4,2节点。现在一个节点在撑着,压力很大。尝试启动挂掉的节点,可以正常启动,一旦执行DML

刚到公司收到一朋友的留言,说RAC的一个节点挂了。 因为他昨晚6点重建过一个索引,跑了2个多小时还没结束, 后来他就手工取消了。 晚上11点多,其中一个节点就出现问题了。

DB 环境是:AIX 6.1 + Oracle 10.2.0.4,2节点。现在一个节点在撑着,压力很大。尝试启动挂掉的节点,可以正常启动,一旦执行DML 操作,节点就挂掉了。

Alert log 信息:

ORA-00600: internal error code, arguments:[157], [], [], [], [], [], [], []
Wed Sep 7 00:26:42 2011
Errors in file /apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc:
ORA-00600: internal error code, arguments: [157], [], [], [], [], [], [], []
Wed Sep 7 00:26:44 2011
Trace dumping is performing id=[cdmp_201109070024]
Wed Sep 7 00:27:10 2011
Errors in file /apps/oracle/admin/sfc3db/udump/sfc3db2_ora_2756730.trc:
ORA-00600: internal error code, arguments: [157], [], [], [], [], [], [], []
Wed Sep 7 00:28:11 2011
Errors in file /apps/oracle/admin/sfc3db/udump/sfc3db2_ora_2756730.trc:
ORA-00600: internal error code, arguments: [157], [], [], [], [], [], [], []

我问朋友是否还有其他的错误,朋友说只有这些。

部分Trace 信息如下:

/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc:calls aborted: 0, num est exec limit hit: 0
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: name=update seq$ setincrement$=:2,minvalue=:3,maxvalue=:4,cycle#=:5,order$=:6,cache=:7,highwater=:8,audit$=:9,flags=:10where obj#=:1
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: name=selectdecode(failover_method, NULL, 0 , 'BASIC', 1, 'PRECONNECT', 2 , 'PREPARSE', 4 ,0), decode(failover_type, NULL, 1 , 'NONE', 1 , 'SESSION', 2, 'SELECT', 4, 1),failover_retries, failover_delay, flags from service$ where name = :1
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc:selectSYS_CONTEXT('USERENV', 'SERVER_HOST'), SYS_CONTEXT('USERENV','DB_UNIQUE_NAME'), SYS_CONTEXT('USERENV', 'INSTANCE_NAME'),SYS_CONTEXT('USERENV', 'SERVICE_NAME'), INSTANCE_NUMBER, STARTUP_TIME,SYS_CONTEXT('USERENV', 'DB_DOMAIN') from v$instance whereINSTANCE_NAME=SYS_CONTEXT('USERENV', 'INSTANCE_NAME')
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: name=select value$from props$ where name = 'GLOBAL_DB_NAME'
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: name=selectprivilege#,level from sysauth$ connect by grantee#=prior privilege# andprivilege#>0 start with grantee#=:1 and privilege#>0
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: name=select privilege#from sysauth$ where (grantee#=:1 or grantee#=1) and privilege#>0

之前整理过一篇文章:

ORA-600 各个参数含义说明

根据这篇文章里的说明,可以判断:

ORA-00600:internal error code, arguments: [157], [], [], [], [], [], [], []

是与并行查询有关。

朋友问我是否和中断的索引rebuild有关系,我说不会。

对于手工中断的重建索引,会遗留一些temporary segments。 因为rebuild index时,会在用户索引空间的segments,会分配一个temporary segment 来保存索引的信息,当rebuild 结束之后,old index 被droped 掉,之前分配的temporary segments 会定义为permanent segment。

如果我们中断了rebuild 操作,那么SMON会三分钟清理一次(前提是接到post),如果SMON过于繁忙那么可能temporary segment长期不被清理。temporary segment长期不被清理可能造成一个典型的问题是:在rebuild index online失败后,后续执行的rebuild index命令要求之前产生的temporary segment已被cleanup,如果cleanup没有完成那么就需要一直等下去。

下面两篇文章有详细的说明:

Oracle alter index rebuild 说明

Oracle rebuild index 报ORA-01652 解决办法

我对问题的定位还是在并行查询的SQL 上。 但是朋友提供的trace 并没有得到相关的信息。

在MOS 上搜了一下,有一篇相关文章:

ORA-600 [157] Running Parallel Query on RAC [ID839536.1]

An ORA-600[157] is highly transient in nature. Most bugs filed for thisissue have been closed as not reproducible. The purposeof this note is to document a known workaround if you see this error withsimilar circumstances.

The followingselect statement failed in a 10.2.0.3, 3-node, RAC database when runningin parallel:

ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [157], [], [], [], [], [], [], []
Current SQL statement for this session:
SELECT /*+ PARALLEL (ORG,3,3)*/
ORG.X_EFX_SSN,
ORG.X_EFX_BIRTH_DT,
COUNT(*)
FROM
SIEBEL.S_ORG_EXT ORG
GROUP BY
X_EFX_SSN,
X_EFX_BIRTH_DT
HAVING COUNT(*) > 1

The tracefile showed the following callstack:

----- Call Stack Trace -----
kxfprigdb <- kxfpqrgdb <- kxfxgs <- kxfxcw <- qerpxFetch <- opifch2 <- kpoal8 <- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o <- opimai_real <- main <- start

The tracefile also showed the process statewas busy holding a child latch:

===================================================
PROCESS STATE
-------------
Process global information:
process: 7000004748eaa00, call: 700000443d33870, xact: 0, curses: 700000474e203c8, usrses: 700000474e203c8
----------------------------------------
SO: 7000004748eaa00, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=157, calls cur/top: 700000443d33870/700000443d33870, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 249
last post received-location: kxfprienq: QC
last process to post me: 700000474914f40 186 0
last post sent: 0 0 250
last post sent-location: kxfprienq: slave
last process posted by me: 700000474914f40 186 0
(latch info) wait_event=0 bits=10
holding (efd=7) 700000472e4a838 Child process queue reference level=4 child#=99
Location from where latch is held: kxfprigdb: KSLBEGIN: addr qref <---
Context saved from call: 504403177372952360
state=busy, wlstate=free <----
Process Group: DEFAULT, pseudo proc: 700000474a384f0
O/S info: user: oracle, term: UNKNOWN, ospid: 2351144
OSD pid info: Unix process pid: 2351144, image: oraclePPSOLTP1@psoldbap003

最终的解决方法:

Workaround:

Bounce all instances in the RAC cluster.

重启RAC 上的所有instance。 朋友在中午申请了停机时间, 重启之后,, RAC 节点正常。

下载本文
显示全文
专题