ORACLE数据库实例报错ORA-00470: LGWR process terminated with error宕机问题分析报告

news2025/5/18 10:29:13
  • 服务概述

10月21号03:22分,BOSS数据库实例发生异常宕机;工程师及时响应此问题并对此故障原因进行分析及相关建议,详细的故障情况及相关日志、TRACE文件的分析及总结、建议,请参阅本文档。

  • hzboss数据库实例宕机分析

4.1 数据库层面日志的分析

4.1.1故障数据库实例alert日志

数据库实例的Alert日志

对故障时间点附近的ALERT日志分析如下:

Wed Oct 21 03:06:22 2024
Thread 1 advanced to log sequence 254206 (LGWR switch)
  Current log# 9 seq# 254206 mem# 0: /boss_sysdata/oradata/dtvboss/redo09.log
Wed Oct 21 03:17:10 2024
WARNING: aiowait timed out 1 times --该时间段系统出现aio等待的告警,该告警由_iowait_timeouts隐含参数控制,超过该参数设置的值后会导致系统宕机,这里并未超过系统设置值。但是提示我们当时系统在io调度上出现了问题。
Wed Oct 21 03:22:11 2024
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_arc0_404.trc:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 367' --22分17秒系统抛错,进程367持有controlfile的enqueue lock时间被其他进程(arc0)等待超时,该enqueue lock持有时间由隐含参数_controlfile_enqueue_timeout控制,默认为900秒
Wed Oct 21 03:22:17 2024
System State dumped to trace file /oracle/admin/dtvboss/bdump/dtvboss_arc0_404.trc
Wed Oct 21 03:22:50 2024
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_lgwr_365.trc:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 367' --22分50秒,进程367持有的controlfile的enqueue lock时间被其他进程(lgwr)等待超时抛错
Wed Oct 21 03:22:55 2024
Killing enqueue blocker (pid=367) on resource CF-00000000-00000000 --LGWR进程发现异常查杀持有controlfile equeue lock的进程367
 by killing session 1648.1
Wed Oct 21 03:22:55 2024
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_lgwr_365.trc:
ORA-07445: exception encountered: core dump [ksuklms()+672] [SIGSEGV] [Address not mapped to object] [0x000000062] [] []
Wed Oct 21 03:23:18 2024
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_pmon_343.trc: --后续一系列报错宕机开始
ORA-00470: LGWR process terminated with error
Wed Oct 21 03:23:18 2024
PMON: terminating instance due to error 470
Wed Oct 21 03:23:18 2024
ORA-470 encountered when generating server alert SMG-3503
Wed Oct 21 03:23:18 2024
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_j006_22335.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00470: LGWR process terminated with error
Wed Oct 21 03:23:22 2024
Termination issued to instance processes. Waiting for the processes to exit
Wed Oct 21 03:23:28 2024
Instance termination failed to kill one or more processes

4.1.2 故障时trace文件分析

故障时,节点数据库实例的alert日志中提到了如下TRACE文件:dtvboss_arc0_404.trc

分析如下:

/oracle/admin/dtvboss/bdump/dtvboss_arc0_404.trc

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

ORACLE_HOME = /oracle/product/10.2.0/db_1

System name: SunOS

Node name: hzboss-ora1

Release: 5.10

Version: Generic_144488-12

Machine: sun4u

Instance name: dtvboss

Redo thread mounted by this instance: 1

Oracle process number: 23

Unix process pid: 404, image: oracle@hzboss-ora1 (ARC0)

*** SERVICE NAME:(SYS$BACKGROUND) 2024-10-21 03:22:11.264

*** SESSION ID:(1646.3) 2024-10-21 03:22:11.264

*** 2024-10-21 03:22:11.264

Unable to get enqueue on resource CF-00000000-00000000 (ges mode req=3 held=6)

Possible local blocker ospid=367 sid=1648 sser=1 time_held=37 secs (ges mode req=6 held=4)

DUMP LOCAL BLOCKER: initiate state dump for KILL BLOCKER

  possible owner[17.367] on resource CF-00000000-00000000

Dumping process info of pid[17.367] requested by pid[23.404]

ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 367'

-------------------------------------------------------------------------------

ENQUEUE [CF] HELD FOR TOO LONG

 

enqueue holder: 'inst 1, osid 367'

 

Process 'inst 1, osid 367' is holding an enqueue for maximum allowed time.

The process will be terminated.

 

Oracle Support Services triaging information: to find the root-cause, look

at the call stack of process 'inst 1, osid 367' located below. Ask the

developer that owns the first NON-service layer in the stack to investigate.

Common service layers are enqueues (ksq), latches (ksl), library cache

pins and locks (kgl), and row cache locks (kqr).

 

Dumping process 17.367 info:

*** 2024-10-21 03:22:11.300

---------------------------------------------------------------

从这一段信息可以看到,此TRACE文件是ARC0进程的TRACE,ARC0进程的ORACLE SID是23,OS PID是404;

ARC0进程无法获取资源:CF-00000000-00000000 (controlfile enqueue 队列锁),此资源的持有者是pid[17.367]

ARC0进程的TRACE分析

PROCESS 23:

  ----------------------------------------

  SO: 8a9004f88, type: 2, owner: 0, flag: INIT/-/-/0x00

  (process) Oracle pid=23, calls cur/top: 8a95c6650/8a95c6650, flag: (2) SYSTEM

            int error: 0, call error: 0, sess error: 0, txn error 0

  (post info) last post received: 0 0 24

              last post received-location: ksasnd

              last process to post me: b66002018 1 6

              last post sent: 0 0 24

              last post sent-location: ksasnd

              last process posted by me: b66005770 1 2

    (latch info) wait_event=0 bits=0

    Process Group: DEFAULT, pseudo proc: 5eb3388f0

    O/S info: user: oracle, term: UNKNOWN, ospid: 404

    OSD pid info: Unix process pid: 404, image: oracle@hzboss-ora1 (ARC0)

Dump of memory from 0x00000005F033B1F0 to 0x00000005F033B3F8

5F033B1F0 0000000C 00000000 00000008 AF739C70  [.............s.p]

………………………………

5F033B3E0 00000005 000313A7 00000008 AF739C70  [.............s.p]

5F033B3F0 00000006 000313A7                    [........]        

    (FOB) flags=2 fib=8af713780 incno=0 pending i/o cnt=0

     fname=/boss_sysdata/oradata/dtvboss/control03.ctl

     fno=2 lblksz=16384 fsiz=1308

    (FOB) flags=2 fib=8af7133e0 incno=0 pending i/o cnt=0

     fname=/boss_sysdata/oradata/dtvboss/control02.ctl

     fno=1 lblksz=16384 fsiz=1308

    (FOB) flags=2 fib=8af713040 incno=0 pending i/o cnt=0

     fname=/boss_sysdata/oradata/dtvboss/control01.ctl

     fno=0 lblksz=16384 fsiz=1308

    ----------------------------------------

    SO: 8a9778500, type: 11, owner: 8a9004f88, flag: INIT/-/-/0x00

    (broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: 8a9004f88,

                       event: 21, last message event: 21,

                       last message waited event: 21, messages read: 0

                       channel: (5eb52cc20) scumnt mount lock

                                scope: 1, event: 22, last mesage event: 20,

                                publishers/subscribers: 0/20,

                                messages published: 1

    ----------------------------------------

    SO: 8a95b7328, type: 4, owner: 8a9004f88, flag: INIT/-/-/0x00

    (session) sid: 1646 trans: 0, creator: 8a9004f88, flag: (51) USR/- BSY/-/-/-/-/-

              DID: 0001-0017-0000000D, short-term DID: 0000-0000-00000000

              txn branch: 0

              oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS

    service name: SYS$BACKGROUND

    last wait for 'enq: CF - contention' blocking sess=0x0 seq=43322 wait_time=2929723 seconds since wait started=36

                name|mode=43460004, 0=0, operation=0

    Dumping Session Wait History

     for 'enq: CF - contention' count=1 wait_time=2929723

                name|mode=43460004, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929725

                name|mode=43460004, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929726

                name|mode=43460004, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929726

                name|mode=43460004, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929725

                name|mode=43460004, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929723

                name|mode=43460004, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929725

                name|mode=43460004, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929726

                name|mode=43460004, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929724

                name|mode=43460004, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929750

                name|mode=43460004, 0=0, operation=0

    temporary object counter: 0

      ----------------------------------------

      UOL used : 0 locks(used=0, free=0)

      KGX Atomic Operation Log 5eaffc848

       Mutex 0(0, 0) idn 0 oper NONE

       Library Cache uid 1646 efd 0 whr 0 slp 0

      KGX Atomic Operation Log 5eaffc890

       Mutex 0(0, 0) idn 0 oper NONE

       Library Cache uid 1646 efd 0 whr 0 slp 0

      KGX Atomic Operation Log 5eaffc8d8

       Mutex 0(0, 0) idn 0 oper NONE

       Library Cache uid 1646 efd 0 whr 0 slp 0

      ----------------------------------------

      SO: 8afaa4330, type: 41, owner: 8a95b7328, flag: INIT/-/-/0x00

      (dummy) nxc=0, nlb=0   

    ----------------------------------------

    SO: 8a97783d8, type: 11, owner: 8a9004f88, flag: INIT/-/-/0x00

    (broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: 8a9004f88,

                       event: 24, last message event: 27,

                       last message waited event: 27, messages read: 1

                       channel: (5eb528a48) system events broadcast channel

                                scope: 2, event: 2923059, last mesage event: 27,

                                publishers/subscribers: 0/421,

                                messages published: 1

    ----------------------------------------

    SO: 8a95c6650, type: 3, owner: 8a9004f88, flag: INIT/-/-/0x00

    (call) sess: cur 8a95b7328, rec 8a9591bc8, usr 8a95b7328; depth: 0

      ----------------------------------------

      SO: 8a95d1fb8, type: 5, owner: 8a95c6650, flag: INIT/-/-/0x00

      (enqueue) CF-00000000-00000000 DID: 0001-0017-0000000D

      lv: 00 79 4a fb 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x2

      res: 0x5eb40f608, lock_flag: 0x0

      own: 0x8a95b7328, sess: 0x8a95b7328, prv: 0x8a95d1fe8

      ----------------------------------------

      SO: 8a9591bc8, type: 4, owner: 8a95c6650, flag: INIT/-/-/0x00

      (session) sid: 1618 trans: 0, creator: 0, flag: (2) -/REC -/-/-/-/-/-

                DID: 0000-0000-00000000, short-term DID: 0000-0000-00000000

                txn branch: 0

                oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS

      temporary object counter: 0

    ----------------------------------------

    SO: 8a97ad460, type: 16, owner: 8a9004f88, flag: INIT/-/-/0x00

    (osp req holder)

此段信息可以看到,ARC0进程等待controlfile enqueue 队列锁。等待事件为enq: CF - contention

 

SO: 8a95d1fb8, type: 5, owner: 8a95c6650, flag: INIT/-/-/0x00

      (enqueue) CF-00000000-00000000 DID: 0001-0017-0000000D

      lv: 00 79 4a fb 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x2

      res: 0x5eb40f608, lock_flag: 0x0

      own: 0x8a95b7328, sess: 0x8a95b7328, prv: 0x8a95d1fe8

这里说明arc0在申请的资源是:res:0x5eb40f608

而在process 17号的ckpt中可以看到0x5eb40f608资源被ckptmode: SSX方式持有持有

CKPT进程的TRACE分析

PROCESS 17:

  ----------------------------------------

  SO: 8a90037d0, type: 2, owner: 0, flag: INIT/-/-/0x00

  (process) Oracle pid=17, calls cur/top: 8a95c5df8/8a95c5df8, flag: (6) SYSTEM

            int error: 0, call error: 0, sess error: 0, txn error 0

  (post info) last post received: 0 0 114

              last post received-location: kcbbza

              last process to post me: 8a9001830 1 6

              last post sent: 0 0 9

              last post sent-location: ksqrcl

              last process posted by me: b66011d18 91 0

    (latch info) wait_event=0 bits=0

    Process Group: DEFAULT, pseudo proc: 5eb3388f0

    O/S info: user: oracle, term: UNKNOWN, ospid: 367

    OSD pid info: Unix process pid: 367, image: oracle@hzboss-ora1 (CKPT)

Dump of memory from 0x00000005EF308A18 to 0x00000005EF308C20

5EF308A10                   0000000C 00000000          [........]

…………………………………………

    ----------------------------------------

    SO: 8a95b9df8, type: 4, owner: 8a90037d0, flag: INIT/-/-/0x00

    (session) sid: 1648 trans: 0, creator: 8a90037d0, flag: (51) USR/- BSY/-/-/-/-/-

              DID: 0001-0011-00000005, short-term DID: 0000-0000-00000000

              txn branch: 0

              oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS

    service name: SYS$BACKGROUND

    waiting for 'direct path write' blocking sess=0x0 seq=39369 wait_time=0 seconds since wait started=37

                file number=72, first dba=1, block cnt=1

……………………………………

    ----------------------------------------

    SO: 8a9777a30, type: 11, owner: 8a90037d0, flag: INIT/-/-/0x00

    (broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: 8a90037d0,

                       event: 14, last message event: 27,

                       last message waited event: 27, messages read: 1

                       channel: (5eb528a48) system events broadcast channel

                                scope: 2, event: 2923059, last mesage event: 27,

                                publishers/subscribers: 0/421,

                                messages published: 1

    ----------------------------------------

    SO: 8a95c5df8, type: 3, owner: 8a90037d0, flag: INIT/-/-/0x00

    (call) sess: cur 8a95b9df8, rec 8a95b5dc0, usr 8a95b9df8; depth: 0

      ----------------------------------------

      SO: 8a95d1f20, type: 5, owner: 8a95c5df8, flag: INIT/-/-/0x00

      (enqueue) CF-00000000-00000004 DID: 0001-0011-00000005

      lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x2

      res: 0x5eb507858, mode: S, lock_flag: 0x0

      own: 0x8a95b9df8, sess: 0x8a95b9df8, proc: 0x8a90037d0, prv: 0x5eb507868

      ----------------------------------------

      SO: 8a95d1cc0, type: 5, owner: 8a95c5df8, flag: INIT/-/-/0x00

      (enqueue) CF-00000000-00000000 DID: 0001-0011-00000005

      lv: 00 79 4a fb 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x2

      res: 0x5eb40f608, mode: SSX, lock_flag: 0x0

      own: 0x8a95b9df8, sess: 0x8a95b9df8, proc: 0x8a90037d0, prv: 0x5eb40f618

      ----------------------------------------

      SO: 8a95b5dc0, type: 4, owner: 8a95c5df8, flag: INIT/-/-/0x00

      (session) sid: 1645 trans: 0, creator: 0, flag: (2) -/REC -/-/-/-/-/-

                DID: 0000-0000-00000000, short-term DID: 0000-0000-00000000

                txn branch: 0

                oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS

      temporary object counter: 0

      ----------------------------------------

      SO: 8a95cb468, type: 5, owner: 8a95c5df8, flag: INIT/-/-/0x00

      (enqueue) RS-00000019-00000001 DID: 0001-0011-00000005

      lv: 20 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x2

      res: 0x5eb40f8c8, mode: SS, lock_flag: 0x0

      own: 0x8a95b9df8, sess: 0x8a95b9df8, proc: 0x8a90037d0, prv: 0x5eb40f8d8

      ----------------------------------------

      SO: 8a95cb338, type: 5, owner: 8a95c5df8, flag: INIT/-/-/0x00

      (enqueue) CF-00000000-00000000 DID: 0000-0011-00000004

      lv: 00 79 4a fb 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x2

      res: 0x5eb40f608, mode: SS, lock_flag: 0x0

      own: 0x8a95b9df8, sess: 0x8a95b9df8, proc: 0x8a90037d0, prv: 0x8a95d1cf0

    ----------------------------------------

    SO: 8a97ad2c8, type: 16, owner: 8a90037d0, flag: INIT/-/-/0x00

    (osp req holder)

此处可以看到CKPT进程持有(enqueue) CF-00000000-00000000,持有模式是:res: 0x5eb40f608, mode: SSX, lock_flag: 0x0,可以对应到ARC0进程请求的资源。

LGWR进程的TRACE分析

PROCESS 10:

  ----------------------------------------

  SO: b66002018, type: 2, owner: 0, flag: INIT/-/-/0x00

  (process) Oracle pid=10, calls cur/top: b665c6de0/b665c6de0, flag: (6) SYSTEM

            int error: 0, call error: 0, sess error: 0, txn error 0

  (post info) last post received: 0 0 24

              last post received-location: ksasnd

              last process to post me: 8a9000860 1 6

              last post sent: 109 0 4

              last post sent-location: kslpsr

              last process posted by me: 8a902ef90 105 0

    (latch info) wait_event=0 bits=0

    Process Group: DEFAULT, pseudo proc: 5eb3388f0

    O/S info: user: oracle, term: UNKNOWN, ospid: 365

    OSD pid info: Unix process pid: 365, image: oracle@hzboss-ora1 (LGWR)

Dump of memory from 0x00000005EF308810 to 0x00000005EF308A18

5EF308810 00000015 00000000 0000000B 6C1C1158  [............l..X]

  ………………

    ----------------------------------------

    SO: b665cc938, type: 5, owner: b66002018, flag: INIT/-/-/0x00

    (enqueue) RT-00000001-00000000 DID: 0000-000A-00000005

    lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x2

    res: 0x5eb40f828, mode: X, lock_flag: 0x0

    own: 0xb665bf398, sess: 0xb665bf398, proc: 0xb66002018, prv: 0x5eb40f838

    (FOB) flags=2 fib=8af713780 incno=0 pending i/o cnt=0

     fname=/boss_sysdata/oradata/dtvboss/control03.ctl

     fno=2 lblksz=16384 fsiz=1308

    (FOB) flags=2 fib=8af7133e0 incno=0 pending i/o cnt=0

     fname=/boss_sysdata/oradata/dtvboss/control02.ctl

     fno=1 lblksz=16384 fsiz=1308

    (FOB) flags=2 fib=8af713040 incno=0 pending i/o cnt=0

     fname=/boss_sysdata/oradata/dtvboss/control01.ctl

     fno=0 lblksz=16384 fsiz=1308

    ----------------------------------------

    SO: b66778b58, type: 11, owner: b66002018, flag: INIT/-/-/0x00

    (broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: b66002018,

                       event: 12, last message event: 20,

                       last message waited event: 20, messages read: 1

                       channel: (5eb52cc20) scumnt mount lock

                                scope: 1, event: 22, last mesage event: 20,

                                publishers/subscribers: 0/20,

                                messages published: 1

    ----------------------------------------

    SO: b665bf398, type: 4, owner: b66002018, flag: INIT/-/-/0x00

    (session) sid: 824 trans: 0, creator: b66002018, flag: (51) USR/- BSY/-/-/-/-/-

              DID: 0001-000A-00000006, short-term DID: 0000-0000-00000000

              txn branch: 0

              oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS

    service name: SYS$BACKGROUND

    waiting for 'enq: CF - contention' blocking sess=0x8a95b9df8 seq=61266 wait_time=0 seconds since wait started=0

                name|mode=43460005, 0=0, operation=0

    Dumping Session Wait History

     for 'enq: CF - contention' count=1 wait_time=2929719

                name|mode=43460005, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929716

                name|mode=43460005, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929716

                name|mode=43460005, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929718

                name|mode=43460005, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929719

                name|mode=43460005, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929719

                name|mode=43460005, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929713

                name|mode=43460005, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929715

                name|mode=43460005, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929722

                name|mode=43460005, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929736

                name|mode=43460005, 0=0, operation=0

    temporary object counter: 0

      ----------------------------------------

      UOL used : 0 locks(used=0, free=0)

      KGX Atomic Operation Log 5ebff7cd0

       Mutex 0(0, 0) idn 0 oper NONE

       Library Cache uid 824 efd 0 whr 0 slp 0

      KGX Atomic Operation Log 5ebff7d18

       Mutex 0(0, 0) idn 0 oper NONE

       Library Cache uid 824 efd 0 whr 0 slp 0

      KGX Atomic Operation Log 5ebff7d60

       Mutex 0(0, 0) idn 0 oper NONE

       Library Cache uid 824 efd 0 whr 0 slp 0

      ----------------------------------------

      SO: b6caa57e0, type: 41, owner: b665bf398, flag: INIT/-/-/0x00

      (dummy) nxc=0, nlb=0   

    ----------------------------------------

    SO: b66778a48, type: 11, owner: b66002018, flag: INIT/-/-/0x00

    (broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: b66002018,

                       event: 13, last message event: 27,

                       last message waited event: 27, messages read: 1

                       channel: (5eb528a48) system events broadcast channel

                                scope: 2, event: 2923059, last mesage event: 27,

                                publishers/subscribers: 0/421,

                                messages published: 1

    ----------------------------------------

    SO: b665c6de0, type: 3, owner: b66002018, flag: INIT/-/-/0x00

    (call) sess: cur b665bf398, rec 0, usr b665bf398; depth: 0

      ----------------------------------------

      SO: b665cce90, type: 5, owner: b665c6de0, flag: INIT/-/-/0x00

      (enqueue) CF-00000000-00000000 DID: 0001-000A-00000006

      lv: 00 79 4a fb 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x2

      res: 0x5eb40f608, req: SSX, lock_flag: 0x0

      own: 0xb665bf398, sess: 0xb665bf398, proc: 0xb66002018, prv: 0x5eb40f628

    ----------------------------------------

    SO: b667ae720, type: 16, owner: b66002018, flag: INIT/-/-/0x00

    (osp req holder)

从LGWR进程的TRACE中可以看到:waiting for 'enq: CF - contention' blocking sess=0x8a95b9df8 seq=61266 wait_time=0 seconds since wait started=0

说明LGWR进程也在等待controlfile enqueue 队列锁,等待事件为enq: CF  contention。

SO: b665cce90, type: 5, owner: b665c6de0, flag: INIT/-/-/0x00

      (enqueue) CF-00000000-00000000 DID: 0001-000A-00000006

      lv: 00 79 4a fb 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x2

      res: 0x5eb40f608, req: SSX, lock_flag: 0x0

      own: 0xb665bf398, sess: 0xb665bf398, proc: 0xb66002018, prv: 0x5eb40f628

从此段可以看到,LGWR进程请求的CF锁资源是0x5eb40f608,并且是以SSX模式请求;通过CKPT进程的TRACE可以看到其已经own: 0x8a95b9df8 持有此资源res: 0x5eb40f608, mode: SSX, lock_flag: 0x0 SSX模式持有此锁;因此LGWR进程也处于等待此资源的状态。

4.2 主机负载情况分析

从ALERT日志中提到的故障发生时的TRACE文件:

/oracle/admin/dtvboss/bdump/dtvboss_pmon_343.trc

/oracle/admin/dtvboss/bdump/dtvboss_arc0_404.trc

中可以找到故障发生时的主机负载情况:

*** 2024-10-21 03:22:11.300

Dumping diagnostic information for ospid 367:

OS pid = 367

loadavg : 0.44 0.62 1.82

swap info: free_mem = 1435.35M rsv = 59461.24M

           alloc = 55941.14M avail = 55124.91 swap_free = 58645.01M

Dumping diagnostic information for oracle@hzboss-ora1:

OS pid = 24887

loadavg : 1.91 1.01 1.84

swap info: free_mem = 7661.14M rsv = 53520.42M

           alloc = 50905.74M avail = 62991.89 swap_free = 65606.57M

 F S      UID   PID  PPID   C PRI NI     ADDR     SZ    WCHAN    STIME TTY         TIME CMD

 0 S   oracle 24887     1   0  39 20        ? 4115427        ?   Sep 24 ?          29:24 oracledtvboss (LOCAL=NO)

pstack: cannot examine 24887: no such process

*** 2024-10-21 03:23:53.644

ksuitm_check: OS PID=8690 is still alive

*** 2024-10-21 03:23:53.644

Dumping diagnostic information for oracle@hzboss-ora1 (J002):

OS pid = 8690

loadavg : 1.93 1.02 1.84

swap info: free_mem = 7660.87M rsv = 53524.68M

           alloc = 50905.95M avail = 62987.56 swap_free = 65606.30M

 F S      UID   PID  PPID   C PRI NI     ADDR     SZ    WCHAN    STIME TTY         TIME CMD

 0 O   oracle  8690     1   2  39 20        ? 4121237          00:57:24 ?          29:43 ora_j002_dtvboss

pstack: cannot examine 8690: no such process

*** 2024-10-21 03:23:54.751

ksuitm_check: OS PID=21430 is still alive

*** 2024-10-21 03:23:54.751

Dumping diagnostic information for oracle@hzboss-ora1 (J003):

OS pid = 21430

loadavg : 1.94 1.02 1.84

swap info: free_mem = 7660.27M rsv = 53521.18M

           alloc = 50906.49M avail = 62991.00 swap_free = 65605.69M

 F S      UID   PID  PPID   C PRI NI     ADDR     SZ    WCHAN    STIME TTY         TIME CMD

 0 O   oracle 21430     1   2  39 20        ? 4119652          02:59:44 ?           0:27 ora_j003_dtvboss

pstack: cannot examine 21430: unanticipated system error

*** 2024-10-21 03:23:57.839

ksuitm_check: OS PID=367 is still alive

*** 2024-10-21 03:23:57.840

Dumping diagnostic information for oracle@hzboss-ora1 (CKPT):

OS pid = 367

loadavg : 1.98 1.04 1.84

swap info: free_mem = 7660.46M rsv = 53520.98M

           alloc = 50906.02M avail = 62990.88 swap_free = 65605.85M

 F S      UID   PID  PPID   C PRI NI     ADDR     SZ    WCHAN    STIME TTY         TIME CMD

 0 S   oracle   367     1   0  39 20        ? 4116160        ?   Mar 30 ?         498:14 ora_ckpt_dtvboss

pstack: cannot examine 367: no such process

*** 2024-10-21 03:23:58.926

从这些负载信息可以看到,整个系统当时的空闲内存和swap还留有余地,足够当时系统的使用,所以内存不足的原因可以排除。

4.3 数据库版本情况

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

ORACLE_HOME = /oracle/product/10.2.0/db_1

System name: SunOS

Node name: hzboss-ora1

Release: 5.10

Version: Generic_144488-12

Machine: sun4u

Instance name: dtvboss

Redo thread mounted by this instance: 1

Oracle process number: 2

COMP_ID    COMP_NAME                           VERSION         STATUS   MODIFIED                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       

---------- ----------------------------------- --------------- -------- -----------------------                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        

SDO        Spatial                             10.2.0.4.0      VALID    18/FEB/2011:21:05:30                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

ORDIM      Oracle.interMedia                   10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

AMD        OLAP.Catalog                        10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

EM         Oracle.Enterprise.Manager           10.2.0.4.0      VALID    18/FEB/2011:21:04:33                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

XDB        Oracle.XML.Database                 10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

CONTEXT    Oracle.Text                         10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

EXF        Oracle.Expression.Filter            10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

RUL        Oracle.Rule.Manager                 10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

OWM        Oracle.Workspace.Manager            10.2.0.4.3      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

ODM        Oracle.Data.Mining                  10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

CATALOG    Oracle.Database.Catalog.Views       10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

CATPROC    Oracle.Database.Packages.and.Types  10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

JAVAVM     JServer.JAVA.Virtual.Machine        10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

XML        Oracle.XDK                          10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

CATJAVA    Oracle.Database.Java.Packages       10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

APS        OLAP.Analytic.Workspace             10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

XOQ        Oracle.OLAP.API                     10.2.0.4.0      VALID    18/FEB/2011:21:05:29

  • 总结与建议

针对此次数据库实例宕机问题,总结及建议如下:

5.1 问题分析汇总

5.1.1 宕机问题总结

综上可以得出结论,由于arc0进程要写日志需要等待控制文件CF锁,lgwr估计要写日志或者切换也在等待控制文件CF锁,而ckpt进程持有controlfile enqueue lock时间过长,arch0等待超过900秒,导致oracle数据库系统判断异常。

此时oracle数据库系统会让lgwr进程kill持有controlfile enqueue lock的进程,相当于lgwr进程杀了ckpt进程,而ckpt进程又为oracle关键进程,连锁反应,引起宕机。

5.1.2 宕机的问题根源判断

在宕机前系统发出告警aiowait timed out 1 times,初步判断系统在当时的io存在一些问题。而当时系统的内存和swap资源都充足没有出现严重问题,trace中的loadavg : 0.44 0.62 1.82则说明了当时的cpu负载低,这里排除了内存和cpu的问题. 

初步建议检测当时的redo日志的切换和产生量,当时的数据库服务器的io调用频繁度,因此可以参考下数据库的2点到3点间的awr信息(如果有数据库监控系统那么则可以分析到故障发生时间点的io信息,但是当前环境未部署)。

controlfile equeue lock的超时管理机制中,受_kill_controlfile_enqueue_blocker=false,_controlfile_enqueue_timeout,_kill_enqueue_blocker这3个隐含参数的影响,当前环境中我不建议增加_controlfile_enqueue_timeout,而是建议在通过控制_kill_controlfile_enqueue_blocker,_kill_enqueue_blocker参数来实现该机制的管理,以避免宕机的风险。


控制队列锁管理机制的bug以及避免:

控制文件队列锁有一个相关的匹配Bug 7692631 - 'DATABASE CRASHES WITH ORA-494 AFTER UPGRADE TO 10.2.0.4',这里可以通过以下2个参数规避。

_kill_controlfile_enqueue_blocker值分为true和false,默认为true,如果设置为false则不会去kill超时的进程,此时_kill_controlfile_enqueue_blocker=false参数失效。

_kill_enqueue_blocker值分为别0,1,2,3. 当设置为1的时候,保证后台进程不会被kill,而只会kill前台进程,这样可以避免宕机。

5.2 建议

5.2.1在数据库层上的调整建议

经过沟通当时正在跑批,所以当时的日志切换会比较频繁,因此我们建议对此时的redo log的大小匹配处理进行调整,适当增加日志组以及日志文件大小。控制日志切换保持在20-30分钟一次。

同时建议使用数据库监控工具,以便能在故障时候能及时获取到故障前的监控数据。

在参数调整角度上我们可以将_kill_enqueue_blocker设置为1或者将_kill_controlfile_enqueue_blocker设置为false。

5.2.2在系统层上的调整建议

同时检查系统io是否存在问题,

同时建议部署io层面的监控工具。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2378438.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

【前端优化】vue2 webpack4项目升级webpack5,大大提升运行速度

记录一下过程 手里有个老项目,vue2webpack4 项目很大,每次运行、运行都要将近10分钟 现在又要往里面写很多东西,再不优化,开发着会更难受,所以决定先将它升级至webpack5 最初失败的尝试 直接在项目里安装了webpack5 但…

Nginx应用场景详解与配置指南

1. 什么是Nginx? Nginx(发音为"engine-x")是一个高性能的HTTP和反向代理服务器,也是一个IMAP/POP3/SMTP代理服务器。它以高性能、稳定性、丰富的功能集、简单的配置和低资源消耗而闻名。 2. Nginx的主要应用场景 2.1 …

vue2 切换主题色以及单页面好使方法

今天要新增一个页面要根据不同公司切换不同页面主题色&#xff0c;一点一点来&#xff0c;怎么快速更改 el-pagination 分页组件主题色。 <el-pagination :page-size"pageSize" :pager-count"pageCount"layout"sizes, prev, pager, next, jumper,…

JavaScript【6】事件

1.概述&#xff1a; 在 JavaScript 中&#xff0c;事件&#xff08;Event&#xff09;是浏览器或 DOM&#xff08;文档对象模型&#xff09;与 JavaScript 代码之间交互的一种机制。它代表了在浏览器环境中发生的特定行为或者动作&#xff0c;比如用户点击鼠标、敲击键盘、页面…

STM32F10xx 参考手册

6. 什么是寄存器 本章参考资料&#xff1a;《STM32F10xx 参考手册》、《STM32F10xx数据手册》、 学习本章时&#xff0c;配合《STM32F10xx 参考手册》“存储器和总线架构”及“通用I/O(GPIO)”章节一起阅读&#xff0c;效果会更佳&#xff0c;特别是涉及到寄存器说明的部分。…

TIFS2024 | CRFA | 基于关键区域特征攻击提升对抗样本迁移性

Improving Transferability of Adversarial Samples via Critical Region-Oriented Feature-Level Attack 摘要-Abstract引言-Introduction相关工作-Related Work提出的方法-Proposed Method问题分析-Problem Analysis扰动注意力感知加权-Perturbation Attention-Aware Weighti…

Redis 发布订阅模式深度解析:原理、应用与实践

在现代分布式系统架构中&#xff0c;实时消息传递机制扮演着至关重要的角色。Redis 作为一款高性能的内存数据库&#xff0c;其内置的发布订阅(Pub/Sub)功能提供了一种轻量级、高效的消息通信方案。本文将全面剖析 Redis 发布订阅模式&#xff0c;从其基本概念、工作原理到实际…

飞帆控件 post or get it when it has get

我在这里分享两个链接&#xff1a; post_get_it 设计 - 飞帆 有人看出来这个控件是干什么用吗&#xff1f; 控件的配置&#xff1a;

SQL里where条件的顺序影响索引使用吗?

大家好&#xff0c;我是锋哥。今天分享关于【SQL里where条件的顺序影响索引使用吗&#xff1f;】面试题。希望对大家有帮助&#xff1b; SQL里where条件的顺序影响索引使用吗&#xff1f; 1000道 互联网大厂Java工程师 精选面试题-Java资源分享网 在 SQL 查询中&#xff0c;W…

SAP学习笔记 - 开发豆知识02 - com.sap.cds.services.cds.CdsService 废止,那么用什么代替呢?

我看很多人代码里面用的都是这个CdsService类&#xff0c;可以自从2.0版本往上这个类就没了啊。 它的代替是什么呢&#xff1f;用的CqnService 那么怎么查的呢&#xff1f; 我也是下载好几个包&#xff0c;解压看&#xff0c;然后发现这里还可以直接看&#xff0c;挺方便的。…

OpenResty 深度解析:构建高性能 Web 服务的终极方案

引言 openresty是什么&#xff1f;在我个人对它的理解来看相当于嵌入了lua的nginx; 我们在nginx中嵌入lua是为了不需要再重新编译,我们只需要重新修改lua脚本,随后重启即可; 一.lua指令序列 我们分别从初始化阶段&#xff0c;重写/访问阶段&#xff0c;内容阶段&#xff0c;日志…

什么是路由器环回接口?

路由器环回接口&#xff08;LoopbackInterface&#xff09;是网络设备中的一种逻辑虚拟接口&#xff0c;不依赖物理硬件&#xff0c;但在网络配置和管理中具有重要作用。以下是其核心要点&#xff1a; 一、基本特性 1.虚拟性与稳定性 环回接口是纯软件实现的逻辑接口&#x…

【MySQL进阶】如何在ubuntu下安装MySQL数据库

前言 &#x1f31f;&#x1f31f;本期讲解关于如何在ubuntu环境下安装mysql的详细介绍~~~ &#x1f308;感兴趣的小伙伴看一看小编主页&#xff1a;GGBondlctrl-CSDN博客 &#x1f525; 你的点赞就是小编不断更新的最大动力 &#x1f3…

【数据结构】_二叉树

1.二叉树链式结构的实现 1.1 前置说明 在学习二叉树的基本操作前&#xff0c;需先要创建一棵二叉树&#xff0c;然后才能学习其相关的基本操作。由于现在大家对二叉树结构掌握还不够深入&#xff0c;为了降低大家学习成本&#xff0c;此处手动快速创建一棵简单的二叉树&#x…

给图表组件上点“颜色” —— 我与 CodeBuddy 的合作记录

我正在参加CodeBuddy「首席试玩官」内容创作大赛&#xff0c;本文所使用的 CodeBuddy 免费下载链接&#xff1a;腾讯云代码助手 CodeBuddy - AI 时代的智能编程伙伴 前段时间&#xff0c;我在开发一个 Vue3 项目的时候&#xff0c;碰到了一个小小的挑战&#xff1a;我想做一个可…

使用 YOLO 结合 PiscTrace 实现股票走势图像识别

在智能投研和金融分析中&#xff0c;自动识别图表中的模式&#xff08;如 K 线走势、支撑/阻力位、形态结构&#xff09;成为一种新兴手段。传统的技术分析依赖大量人工判断&#xff0c;而计算机视觉技术的发展&#xff0c;特别是 YOLO 模型在图像识别领域的高效表现&#xff0…

OpenCV中的光流估计方法详解

文章目录 一、引言二、核心算法原理1. 光流法基本概念2. 算法实现步骤 三、代码实现详解1. 初始化设置2. 特征点检测3. 光流计算与轨迹绘制 四、实际应用效果五、优化方向六、结语 一、引言 在计算机视觉领域&#xff0c;运动目标跟踪是一个重要的研究方向&#xff0c;广泛应用…

NBA足球赛事直播源码体育直播M33模板赛事源码

源码名称&#xff1a;体育直播赛事扁平自适应M33直播模板源码 开发环境&#xff1a;帝国cms7.5 空间支持&#xff1a;phpmysql 带软件采集&#xff0c;可以挂着自动采集发布&#xff0c;无需人工操作&#xff01; 演示地址&#xff1a;NBA足球赛事直播源码体育直播M33模板赛事…

电子电器架构 --- 整车造车阶段四个重要节点

我是穿拖鞋的汉子,魔都中坚持长期主义的汽车电子工程师。 老规矩,分享一段喜欢的文字,避免自己成为高知识低文化的工程师: 钝感力的“钝”,不是木讷、迟钝,而是直面困境的韧劲和耐力,是面对外界噪音的通透淡然。 生活中有两种人,一种人格外在意别人的眼光;另一种人无论…

黑马点评-用户登录

文章目录 用户登录发送短信验证码注册/登录校验登录 用户登录 发送短信验证码 public Result sendCode(String phone, HttpSession session) {// 1.校验手机号if (RegexUtils.isPhoneInvalid(phone)) {// 2.如果不符合&#xff0c;返回错误信息return Result.fail("手机…