了解Database Replay Capture内部原理
Database Replay是11g中很酷的特性,对于workload capture的内部工作原理大家理解的不多,这里就介绍一下。
对于Workload Capture需要考虑的因素:
负载捕获文件消耗定量的磁盘空间,这些捕获文件是2进程文件,无法直接阅读,有测试表明在大并发量的OLTP环境中可以达到capture 10分钟占用1G磁盘空间
数据库重启:
可能是保证捕获所有事务的可靠重放的唯一路径
使用startup restrict启动实例,避免不完整的事务捕获
启动capture会取消restrict模式
基于负载类型重启不是必要的
为重放目的恢复数据库的多种方法:
基于scn或时间的物理恢复
逻辑恢复应用数据
闪回或者快照数据
Capture可以指定过滤器作为捕获workload子集的方法
需要的权限包括SYSDBA、SYSOPER和合适的OS权限
性能消耗:
在TPCC测试中capture的性能损耗为4.5%
对于每个session会多消耗64KB的内存
必要的Workload Capture耗费文件系统磁盘空间
这里需要注意的有2点, 对于RAC集群存在workload capture file的目录必须位于共享文件系统上,否则start_capture时会报错。
对于每个session会多消耗64KB的内存,这是由于本质上写出负载信息到workload capture file的同样是Server Process服务进程本身,但是这种写出并非在parse解析或execution执行阶段,Server Process将其LOGON、LOGOFF、SQL执行等信息记录存放在PGA中,主要是WCR Capture PG、WCR Capture PGA中,当PGA中的工作负载历史记录达到一定数目时,Server Process本身负责写出这些数据到WCR文件中,在写出这些WCR文件时Server Process进入’WCR: capture file IO write’等待事件。
与WCR相关的等待事件另有:
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE 11.2.0.3.0 Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> select name from v$event_name where name like '%WCR%';
NAME
----------------------------------------------------------------
WCR: replay client notify
WCR: replay clock
WCR: replay lock order
WCR: replay paused
WCR: RAC message context busy
WCR: capture file IO write
WCR: Sync context busy
latch: WCR: sync
latch: WCR: processes HT
11g中还多出了部分为WCR而设的LATCH
1* select name,gets from v$latch where name like '%WCR%'
SQL> /
NAME GETS
------------------------------ ----------
WCR: kecu cas mem 3
WCR: kecr File Count 37
WCR: MMON Create dir 1
WCR: ticker cache 0
WCR: sync 495
WCR: processes HT 0
WCR: MTS VC queue 0
7 rows selected.
我们通过如下演示具体说明Database Replay Capture的内部原理
1. 首先打开capture dbms_workload_capture.start_capture
CREATE OR REPLACE DIRECTORY dbcapture AS '/home/oracle/dbcapture';
execute dbms_workload_capture.start_capture('CAPTURE','DBCAPTURE',default_action=>'INCLUDE');
SQL> select id,name,status,start_time,end_time,connects,user_calls,dir_path from
dba_workload_captures where id = (select max(id) from dba_workload_captures) ;
ID
----------
NAME
--------------------------------------------------------------------------------
STATUS START_TIM END_TIME CONNECTS
---------------------------------------- --------- --------- ----------
USER_CALLS
----------
DIR_PATH
--------------------------------------------------------------------------------
1
CAPTURE
IN PROGRESS 08-DEC-12 11
ID
----------
NAME
--------------------------------------------------------------------------------
STATUS START_TIM END_TIME CONNECTS
---------------------------------------- --------- --------- ----------
USER_CALLS
----------
DIR_PATH
--------------------------------------------------------------------------------
167
/home/oracle/dbcapture
2. 窥探 capture file目录
[oracle@mlab2 dbcapture]$ ls -lR
.:
total 8
drwxr-xr-x 2 oracle oinstall 4096 Dec 8 07:24 cap
drwxr-xr-x 3 oracle oinstall 4096 Dec 8 07:24 capfiles
-rw-r--r-- 1 oracle oinstall 0 Dec 8 07:24 wcr_cap_00001.start
./cap:
total 4
-rw-r--r-- 1 oracle oinstall 91 Dec 8 07:24 wcr_scapture.wmd
./capfiles:
total 4
drwxr-xr-x 12 oracle oinstall 4096 Dec 8 07:24 inst1
./capfiles/inst1:
total 40
drwxr-xr-x 2 oracle oinstall 4096 Dec 8 08:31 aa
drwxr-xr-x 2 oracle oinstall 4096 Dec 8 07:24 ab
drwxr-xr-x 2 oracle oinstall 4096 Dec 8 07:24 ac
drwxr-xr-x 2 oracle oinstall 4096 Dec 8 07:24 ad
drwxr-xr-x 2 oracle oinstall 4096 Dec 8 07:24 ae
drwxr-xr-x 2 oracle oinstall 4096 Dec 8 07:24 af
drwxr-xr-x 2 oracle oinstall 4096 Dec 8 07:24 ag
drwxr-xr-x 2 oracle oinstall 4096 Dec 8 07:24 ah
drwxr-xr-x 2 oracle oinstall 4096 Dec 8 07:24 ai
drwxr-xr-x 2 oracle oinstall 4096 Dec 8 07:24 aj
./capfiles/inst1/aa:
total 316
-rw-r--r-- 1 oracle oinstall 1762 Dec 8 07:25 wcr_c6cdah0000001.rec
-rw-r--r-- 1 oracle oinstall 16478 Dec 8 07:28 wcr_c6cf1h0000002.rec
-rw-r--r-- 1 oracle oinstall 1772 Dec 8 07:29 wcr_c6cjdh0000004.rec
-rw-r--r-- 1 oracle oinstall 1535 Dec 8 07:29 wcr_c6cnah0000005.rec
-rw-r--r-- 1 oracle oinstall 1821 Dec 8 07:41 wcr_c6cpfh0000007.rec
-rw-r--r-- 1 oracle oinstall 1815 Dec 8 07:33 wcr_c6cq6h000000a.rec
-rw-r--r-- 1 oracle oinstall 1535 Dec 8 07:34 wcr_c6cxmh000000h.rec
-rw-r--r-- 1 oracle oinstall 1427 Dec 8 07:41 wcr_c6cxvh000000j.rec
-rw-r--r-- 1 oracle oinstall 1425 Dec 8 07:41 wcr_c6czph000000k.rec
-rw-r--r-- 1 oracle oinstall 2398 Dec 8 07:49 wcr_c6dqfh000000q.rec
-rw-r--r-- 1 oracle oinstall 259321 Dec 8 08:35 wcr_c6du7h000000r.rec
-rw-r--r-- 1 oracle oinstall 0 Dec 8 07:55 wcr_c6f6yh000000t.rec
-rw-r--r-- 1 oracle oinstall 0 Dec 8 08:28 wcr_c6h3qh0000013.rec
./capfiles/inst1/ab:
total 0
./capfiles/inst1/ac:
total 0
./capfiles/inst1/ad:
total 0
./capfiles/inst1/ae:
total 0
./capfiles/inst1/af:
total 0
./capfiles/inst1/ag:
total 0
./capfiles/inst1/ah:
total 0
./capfiles/inst1/ai:
total 0
./capfiles/inst1/aj:
total 0
[oracle@mlab2 dbcapture]$ cd ./capfiles/inst1/aa
[oracle@mlab2 aa]$ ls -l
total 316
-rw-r--r-- 1 oracle oinstall 1762 Dec 8 07:25 wcr_c6cdah0000001.rec
-rw-r--r-- 1 oracle oinstall 16478 Dec 8 07:28 wcr_c6cf1h0000002.rec
-rw-r--r-- 1 oracle oinstall 1772 Dec 8 07:29 wcr_c6cjdh0000004.rec
-rw-r--r-- 1 oracle oinstall 1535 Dec 8 07:29 wcr_c6cnah0000005.rec
-rw-r--r-- 1 oracle oinstall 1821 Dec 8 07:41 wcr_c6cpfh0000007.rec
-rw-r--r-- 1 oracle oinstall 1815 Dec 8 07:33 wcr_c6cq6h000000a.rec
-rw-r--r-- 1 oracle oinstall 1535 Dec 8 07:34 wcr_c6cxmh000000h.rec
-rw-r--r-- 1 oracle oinstall 1427 Dec 8 07:41 wcr_c6cxvh000000j.rec
-rw-r--r-- 1 oracle oinstall 1425 Dec 8 07:41 wcr_c6czph000000k.rec
-rw-r--r-- 1 oracle oinstall 2398 Dec 8 07:49 wcr_c6dqfh000000q.rec
-rw-r--r-- 1 oracle oinstall 259321 Dec 8 08:35 wcr_c6du7h000000r.rec
-rw-r--r-- 1 oracle oinstall 0 Dec 8 07:55 wcr_c6f6yh000000t.rec
-rw-r--r-- 1 oracle oinstall 0 Dec 8 08:28 wcr_c6h3qh0000013.rec
[oracle@mlab2 aa]$ ls -l |wc -l
14
以上负载目录共14个文件
3. 我们LOGON一个新的Server Process
[oracle@mlab2 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Sat Dec 8 08:37:40 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options
会多出一个wcr文件
[oracle@mlab2 aa]$ ls -ltr
total 316
-rw-r--r-- 1 oracle oinstall 1762 Dec 8 07:25 wcr_c6cdah0000001.rec
-rw-r--r-- 1 oracle oinstall 16478 Dec 8 07:28 wcr_c6cf1h0000002.rec
-rw-r--r-- 1 oracle oinstall 1772 Dec 8 07:29 wcr_c6cjdh0000004.rec
-rw-r--r-- 1 oracle oinstall 1535 Dec 8 07:29 wcr_c6cnah0000005.rec
-rw-r--r-- 1 oracle oinstall 1815 Dec 8 07:33 wcr_c6cq6h000000a.rec
-rw-r--r-- 1 oracle oinstall 1535 Dec 8 07:34 wcr_c6cxmh000000h.rec
-rw-r--r-- 1 oracle oinstall 1425 Dec 8 07:41 wcr_c6czph000000k.rec
-rw-r--r-- 1 oracle oinstall 1427 Dec 8 07:41 wcr_c6cxvh000000j.rec
-rw-r--r-- 1 oracle oinstall 1821 Dec 8 07:41 wcr_c6cpfh0000007.rec
-rw-r--r-- 1 oracle oinstall 2398 Dec 8 07:49 wcr_c6dqfh000000q.rec
-rw-r--r-- 1 oracle oinstall 0 Dec 8 07:55 wcr_c6f6yh000000t.rec
-rw-r--r-- 1 oracle oinstall 0 Dec 8 08:28 wcr_c6h3qh0000013.rec
-rw-r--r-- 1 oracle oinstall 259321 Dec 8 08:35 wcr_c6du7h000000r.rec
-rw-r--r-- 1 oracle oinstall 0 Dec 8 08:37 wcr_c6hp4h0000018.rec
但是新出现的wcr_c6hp4h0000018.rec 是空的
SQL> select spid from v$process where addr = ( select paddr from v$session where sid=(select distinct sid from v$mystat));
SPID
------------------------
14293
该新的服务进程的操作系统进程号为14293, 可以看到该进程的打开文件描述符,其中就包含了wcr_c6hp4h0000018.rec
[oracle@mlab2 ~]$ ls -l /proc/14293/fd
total 0
lr-x------ 1 oracle oinstall 64 Dec 8 08:39 0 -> /dev/null
l-wx------ 1 oracle oinstall 64 Dec 8 08:39 1 -> /dev/null
lrwx------ 1 oracle oinstall 64 Dec 8 08:39 10 -> /u01/app/oracle/product/11201/db_1/rdbms/audit/CRMV_ora_14293_1.aud
l-wx------ 1 oracle oinstall 64 Dec 8 08:39 11 -> /u01/app/oracle/diag/rdbms/crmv/CRMV/trace/CRMV_ora_14293.trc
l-wx------ 1 oracle oinstall 64 Dec 8 08:39 12 -> pipe:[34585895]
l-wx------ 1 oracle oinstall 64 Dec 8 08:39 13 -> /u01/app/oracle/diag/rdbms/crmv/CRMV/trace/CRMV_ora_14293.trm
l-wx------ 1 oracle oinstall 64 Dec 8 08:39 2 -> /dev/null
lr-x------ 1 oracle oinstall 64 Dec 8 08:39 3 -> /dev/null
lr-x------ 1 oracle oinstall 64 Dec 8 08:39 4 -> /dev/null
lr-x------ 1 oracle oinstall 64 Dec 8 08:39 5 -> /u01/app/oracle/product/11201/db_1/rdbms/mesg/oraus.msb
lr-x------ 1 oracle oinstall 64 Dec 8 08:39 6 -> /proc/14293/fd
lr-x------ 1 oracle oinstall 64 Dec 8 08:39 7 -> /dev/zero
lrwx------ 1 oracle oinstall 64 Dec 8 08:39 8 -> /home/oracle/dbcapture/capfiles/inst1/aa/wcr_c6hp4h0000018.rec
lr-x------ 1 oracle oinstall 64 Dec 8 08:39 9 -> pipe:[34585894]
也可以通过lsof查到
[root@mlab2 ~]# lsof|grep wcr_c6hp4h0000018.rec
oracle 14293 oracle 8u REG 8,1 0 17629644 /home/oracle/dbcapture/capfiles/inst1/aa/wcr_c6hp4h0000018.rec
可以得出一个结论,一个Server Process对应一个WCR REC文件,当Server Process LOGON时生成该文件
3.执行少量SQL语句:
SQL> select 1 from dual;
1
----------
1
SQL> /
1
----------
1
[oracle@mlab2 aa]$ strings wcr_c6hp4h0000018.rec
==》执行少量SQL无任何时, 不写出任何数据
执行某个超长SQL语句后,可以在下面看到其文件内容包含该WCR文件的版本号,LOGON的登录信息和所执行过的SQL语句历史,但是不包含执行计划
[oracle@mlab2 aa]$ strings wcr_c6hp4h0000018.rec
11.2.0.3.0
*File header info. (Shadow process='14293')
D0576B5D710A34F4E043B201A8C0ECFE
SYS;
NLS_LANGUAGE?
AMERICAN>
NLS_TERRITORY?
AMERICA>
NLS_CURRENCY?
NLS_ISO_CURRENCY?
AMERICA>
NLS_NUMERIC_CHARACTERS?
NLS_CALENDAR? GREGORIAN>
NLS_DATE_FORMAT? DD-MON-RR>
NLS_DATE_LANGUAGE?
AMERICAN>
NLS_CHARACTERSET?
AL32UTF8>
NLS_SORT?
BINARY>
NLS_TIME_FORMAT?
HH.MI.SSXFF AM>
NLS_TIMESTAMP_FORMAT?
DD-MON-RR HH.MI.SSXFF AM>
NLS_TIME_TZ_FORMAT?
HH.MI.SSXFF AM TZR>
NLS_TIMESTAMP_TZ_FORMAT?
DD-MON-RR HH.MI.SSXFF AM TZR>
NLS_DUAL_CURRENCY?
NLS_SPECIAL_CHARS?
NLS_NCHAR_CHARACTERSET?
UTF8>
NLS_COMP?
BINARY>
NLS_LENGTH_SEMANTICS?
BYTE>
NLS_NCHAR_CONV_EXCP?
FALSE
(DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/u01/app/oracle/product/11201/db_1/bin/oracle)(ARGV0=oracleCRMV)(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))')(DETACH=NO))(CONNECT_DATA=(CID=(PROGRAM=sqlplus)(HOST=mlab2.oracle.com)(USER=oracle))))
,T$sqlplus@mlab2.oracle.com (TNS V1-V3)U
tselect spid from v$process where addr = ( select paddr from v$session where sid=(select distinct sid from v$mystat))
` _
select 1 from dual
select 1 from dual
执行某些错误语句时甚至会记录其错误信息
[oracle@mlab2 aa]$ strings wcr_c6hp4h0000018.rec
9`9_^B
create table vva(t1 int)
`:_i
:`:_iB
`;_^
;`;_^B
create table vva(t1 int)
`_i
>`>_iB
FusC
`?_^
?`?_^B
FvWC
_begin
for i in 1..50000 loop
execute immediate 'select 1 from dual where 2='||i;
end loop;
end;
若SERVER PROCESS LOGOFF 则会有如下信息
C`E_ B
k^2C
以上说明Server Process并不胡在parse或execution阶段写WCR文件,而是将这些数据存放在PGA中,达到一定大小阀值后才写出,所以其性能影响小,但是如果在WCR数据写出前就意外终止则,这些数据将丢失。
4. 窥视进程信息
SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump processstate 10;
Statement processed.
SQL> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/crmv/CRMV/trace/CRMV_ora_14293.trc
从processstate 文件中可以发现历史等待信息包括 WCR: capture file IO write,这是Server process在写WCR 文件
3: waited for 'SQL*Net message to client'
driver id=0x62657100, #bytes=0x1, =0x0
wait_id=139 seq_num=140 snap_id=1
wait times: snap=0.000007 sec, exc=0.000007 sec, total=0.000007 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.934091 sec of elapsed time
4: waited for 'latch: shared pool'
address=0x60106b20, number=0x133, tries=0x0
wait_id=138 seq_num=139 snap_id=1
wait times: snap=0.000066 sec, exc=0.000066 sec, total=0.000066 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 1.180690 sec of elapsed time
5: waited for 'WCR: capture file IO write'
=0x0, =0x0, =0x0
wait_id=137 seq_num=138 snap_id=1
wait times: snap=0.000189 sec, exc=0.000189 sec, total=0.000189 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 3.122783 sec of elapsed time
6: waited for 'WCR: capture file IO write'
=0x0, =0x0, =0x0
wait_id=136 seq_num=137 snap_id=1
wait times: snap=0.000191 sec, exc=0.000191 sec, total=0.000191 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 3.053132 sec of elapsed time
7: waited for 'WCR: capture file IO write'
5.窥视PGA内存信息
SQL> oradebug dump heapdump 536870917;
Statement processed.
grep WCR /u01/app/oracle/diag/rdbms/crmv/CRMV/trace/CRMV_ora_14293.trc
Chunk 7fb1b606bfc0 sz= 65600 freeable "WCR Capture PG " ds=0x7fb1b6115f90
Chunk 7fb1b6111e18 sz= 4224 freeable "WCR Capture PG " ds=0x7fb1b6115f90
Chunk 7fb1b6112e98 sz= 4184 freeable "WCR Capture PG " ds=0x7fb1b6115f90
Chunk 7fb1b6113ef0 sz= 4224 freeable "WCR Capture PG " ds=0x7fb1b6115f90
Chunk 7fb1b6114f70 sz= 4104 recreate "WCR Capture PG " latch=(nil)
Chunk 7fb1b6115f78 sz= 160 freeable "WCR Capture PGA"
Chunk 7fb1b6116018 sz= 3248 freeable "WCR Capture PGA"
Subheap ds=0x7fb1b6115f90 heap name= WCR Capture PG size= 82336
HEAP DUMP heap name="WCR Capture PG" desc=0x7fb1b6115f90
FIVE LARGEST SUB HEAPS for heap name="WCR Capture PG" desc=0x7fb1b6115f9
PGA中存在WCR Capture PG 和WCR Capture PGA的freeable or recreate内存chunk,这些内存可以从Server Process返回给OS
Chunk 7fb1b606bfc0 sz= 65600 freeable "WCR Capture PG " ds=0x7fb1b6115f90
sz= 65600=》 64k 这可能就是文档描述的64k内存,但是实际可以看到使用总量超过了64k
WCR是否可能引起内存泄露呢?:)!
6.隐藏参数
控制WCR CAPTURE的参数至少有以下2个
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
2 FROM SYS.x$ksppi x, SYS.x$ksppcv y
3 WHERE x.inst_id = USERENV ('Instance')
4 AND y.inst_id = USERENV ('Instance')
5 AND x.indx = y.indx
6 AND x.ksppinm in ('_capture_buffer_size','_wcr_control');
NAME VALUE DESCRIB
-------------------- -------------------- ------------------------------------------------------------
_wcr_control 0 Oracle internal test WCR parameter used ONLY for testing!
_capture_buffer_size 65536 To set the size of the PGA I/O recording buffers
其中_capture_buffer_size 控制PGA中WCR BUFFER的SIZE,默认为64k
_wcr_control 控制WCR的内部行为,具体尚不清晰
7.结束capture
SQL> execute dbms_workload_capture.finish_capture;
PL/SQL procedure successfully completed.
DBMS_WORKLOAD_CAPTURE.START_CAPTURE(): Starting database capture at 12/08/2012 07:24:54
DBMS_WORKLOAD_CAPTURE.FINISH_CAPTURE(): Stopped database capture (not all sessions could flush their capture buffers) at 12/08/2012 10:29:49
综上所述,我们得出结论:
1. 负责写WCR WORKLOAD CAPTURE文件的不是什么后台进程,而是Server Process服务进程(前台进程)
2. 每一个server process对应一个WCR文件
3. Server Proess会写出LOGON、LOGOFF、SQL执行以及错误信息到WCR文件中
4. Server Process的写不是立即模式Immediate mode,而是将数据存放在PGA的(WCR Capture) subheap中,当数据达到一定大小或一定时间(timeout才写出一次)
5. 再次强调, Server Process的写不是立即模式Immediate mode,即capture不发生在parse或者execution阶段(很多人认为capture发生在parse时这种说法可以证明为不正确,parse并不受到capture的影响),而仅仅是将LOGON、SQL历史(不包括执行计划)放在PGA的WCR Capture内存子堆中,条件触发才写出,所以其性能损耗小,按照tpcc的测试结论为4.5%
6. 隐藏参数_capture_buffer_size 控制PGA中WCR BUFFER的SIZE,默认为64k
7. WCR Capture文件虽然是binrary 2进制文件,但并未加密,所以第三方软件理论上也可以读取该WCR capture file
8. WCR: capture file IO write等待事件是Server Process在写WCR文件
9. 执行dbms_workload_capture.finish_capture;会让现有Server Process FLUSH WCR buffer,但是就ALERT.LOG的自解释信息看”Stopped database capture (not all sessions could flush their capture buffers)”,这种flush不是必然能回收内存的
扫一扫订阅我的微信号:IT技术博客大学习
- 作者:Maclean Liu 来源: Ask Maclean Oracle Blog
- 标签: Database Replay
- 发布时间:2012-12-09 20:06:42
- [48] Oracle MTS模式下 进程地址与会话信
- [47] 如何拿下简短的域名
- [47] IOS安全–浅谈关于IOS加固的几种方法
- [47] 图书馆的世界纪录
- [46] android 开发入门
- [45] 【社会化设计】自我(self)部分――欢迎区
- [42] 读书笔记-壹百度:百度十年千倍的29条法则
- [41] Go Reflect 性能
- [39] 视觉调整-设计师 vs. 逻辑
- [39] 界面设计速成