AWR Summary Reporting #3 – Waits

加入dbDao.com Oracle技术学习QQ群:171092051

 

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 400-690-3643 备用电话: 13764045638 邮箱:service@parnassusdata.com

 

So far I've shown the skeleton of a query that lists out some key activity data values from all the data captured by AWR in its regular snapshots. So far we've only extracted a few time based data values:
  • Duration - real clock time that occurred between the AWR 2 snapshots being reported on
  • Database Time - time spent by the database being active
  • SQL Execution Time - time spent by the database executing SQL statements
AWR captures a lot of other data in its snapshots and we can add more of these to our query to drill down further into what happened between each pair of snapshots. These other data sets include:
  • Wait event data - individual wait events and wait time, and wait classes
  • System statistics - as normally seen in V$SYSSTAT as cumulative counters
  • SQL execution statistics for individual SQL statements
As before, I'll be adding these as separate sub-queries in the WITH clause so that they produce one row of data values per snapshot, and then joining to these named sub-queries in the main query and outputting the data values we want. This keeps the main query simpler, and pushes issues about grouping and summarising into the sub-query. It also lets us refer to the same sub-query multiple times if needed - which is the case for some of the data captured by AWR e.g. System Statistics are captured as sets of Name / Value pairs per snapshot.

For now lets add on wait event data telling us how long the system waited on different types of events. As we already have two different times for database activity (DB Time & SQL Execution Time), we would most like to know the total wait time within that database activity. This would let us calculate the actual time doing real SQL execution work, and see wait time as a percentage of total database time. The basic query for this extracts the wait times in microseconds from DBA_HIST_SYSTEM_EVENT, and would be:
, syswaits as 
(select sysevent.snap_id
, sysevent.dbid
, sum (sysevent.time_waited_micro - psysevent.time_waited_micro) all_wait_time
from dba_hist_system_event sysevent, dba_hist_system_event psysevent
where sysevent.snap_id = psysevent.snap_id + 1
and sysevent.dbid = psysevent.dbid
and sysevent.instance_number = psysevent.instance_number
and sysevent.event_id = psysevent.event_id
-- Ignore Idle wait events
and sysevent.wait_class != 'Idle'
group by sysevent.snap_id
, sysevent.dbid
) -- syswaits
Note that I exclude "Idle" waits, which are waits unrelated to the execution of a SQL statement.

The next logical thing would be to break down the total wait time somehow into various major components, to show which type of waits occurred more often than the others. There are 3 major ways to break down wait events:
  • Foreground versus background waits i.e. waits experienced directly by sessions for connected users, and waits experienced by permanent background server sessions
  • By Wait Class - these group individual events into common classes such as Concurrency, User I/O, Commit, Network, Cluster.
  • By individual wait event - unfortunately there are around 100 of these, and outputting these as individual columns in the query would be too cumbersome.
So the obvious thing to do is to split out foreground or background wait times (the other one can be calculated using the total wait time), and by wait class as there are only about 10 or so of these. Background waits are actually obtained from a different AWR snapshot table - DBA_HIST_BG_EVENT_SUMMARY. So the query for this would be:
, sysbgwait as 
-- One row per System Wait Event with change in value between snapshots
(select sysevent.snap_id
, sysevent.dbid
, sum (sysevent.time_waited_micro - psysevent.time_waited_micro) time_waited_micro
from dba_hist_bg_event_summary sysevent, dba_hist_bg_event_summary psysevent
where sysevent.snap_id = psysevent.snap_id + 1
and sysevent.dbid = psysevent.dbid
and sysevent.instance_number = psysevent.instance_number
and sysevent.event_id = psysevent.event_id
and sysevent.wait_class != 'Idle'
group by sysevent.snap_id, sysevent.dbid
) -- sysbgwait
This gives the total background wait time in each snapshot for each database instance. The main query can then join to this as it does to the other sub-queries.

For the wait classes we now have two main choices for how to write such sub-queries:
  • Have separate sub-queries for each class
  • Collapse the sub-queries into just one sub-query using the "decode" function to sum values over different criteria
The trouble with the first approach is that each sub-query will probably be executed separately, meaning multiple reads of the same snapshot wait event data to produce multiple data sets, and then each sub-query data set will be joined together by Snapshot ID and Database ID. The second approach should be more efficient when executed - there is only one read of the AWR wait event data, and it is processed as it is read into one result set without any further joins between different wait event data subsets (the only joins are to the other AWR data sets in the main query).

The classes can be broken out using the Oracle "DECODE" function as an "IF" function to test the class name.
, syswaits as 
(select sysevent.snap_id
, sysevent.dbid
, sum (sysevent.time_waited_micro - psysevent.time_waited_micro) all_wait_time
, sum (decode (sysevent.wait_class, 'Commit', (sysevent.time_waited_micro - psysevent.time_waited_micro), 0)) commit_time
, sum (decode (sysevent.wait_class, 'Cluster', (sysevent.time_waited_micro - psysevent.time_waited_micro), 0)) cluster_time
, sum (decode (sysevent.wait_class, 'Concurrency', (sysevent.time_waited_micro - psysevent.time_waited_micro), 0)) concurrency_time
, sum (decode (sysevent.wait_class, 'Network', (sysevent.time_waited_micro - psysevent.time_waited_micro), 0)) network_time
, sum (decode (sysevent.wait_class, 'System I/O', (sysevent.time_waited_micro - psysevent.time_waited_micro), 0)) system_io_time
, sum (decode (sysevent.wait_class, 'User I/O', (sysevent.time_waited_micro - psysevent.time_waited_micro), 0)) user_io_time
from dba_hist_system_event sysevent, dba_hist_system_event psysevent
where sysevent.snap_id = psysevent.snap_id + 1
and sysevent.dbid = psysevent.dbid
and sysevent.instance_number = psysevent.instance_number
and sysevent.event_id = psysevent.event_id
-- Ignore Idle wait events
and sysevent.wait_class != 'Idle'
group by sysevent.snap_id
, sysevent.dbid
) syswaits
The principle is that if the wait class name matches the target class name then the corresponding wait time is used in the sum, otherwise zero is used meaning that particular wait time is not included in the sum for that wait class.

Adding this to our existing query gives the following - note the divides by one million to convert microseconds to seconds:
with
snaps as
...
, systimes as
...
, sysbgwait as
...
, syswaits as
...
select to_char (snaps.end_snap_time, 'DD/MM/YY HH24:MI') snap_time
, extract (second from snaps.snap_interval)
+ (extract (minute from snaps.snap_interval)
+ (extract (hour from snaps.snap_interval)
+ (extract (day from snaps.snap_interval) * 24)
) * 60
) * 60 snap_duration
, dbtime.value / 1000000 db_time
, sqlexectime.value / 1000000 sql_exec_time
, syswaits.all_wait_time / 1000000 all_wait_time
, sysbgwait.time_waited_micro / 1000000 bg_wait_time
, syswaits.commit_time / 1000000 commit_time
, syswaits.cluster_time / 1000000 cluster_time
, syswaits.concurrency_time / 1000000 concurrency_time
, syswaits.network_time / 1000000 network_time
, syswaits.system_io_time / 1000000 system_io_time
, syswaits.user_io_time / 1000000 user_io_time
from snaps
join (select * from systimes where stat_name = 'DB time') dbtime
on snaps.snap_id = dbtime.snap_id and snaps.dbid = dbtime.dbid
join (select * from systimes where stat_name = 'sql execute elapsed time') sqlexectime
on snaps.snap_id = sqlexectime.snap_id and snaps.dbid = sqlexectime.dbid
join syswaits
on snaps.snap_id = syswaits.snap_id and snaps.dbid = syswaits.dbid
join sysbgwait
on snaps.snap_id = sysbgwait.snap_id and snaps.dbid = sysbgwait.dbid
order by snaps.end_snap_time
/
And again, it would be good to define some nice column formats when running this in SQL*Plus:
col snap_time         format a15         heading 'SNAP|TIME'
col snap_duration format 999,999 heading 'SNAP|DURATION'
col db_time format 999,990 heading 'DB|TIME'
col sql_exec_time format 999,990 heading 'SQL EXEC|TIME'

col all_wait_time format 999,990 heading 'ALL WAIT|TIME'
col bg_wait_time format 999,990 heading 'BG WAIT|TIME'
col commit_time format 999,990 heading 'COMMIT|WAIT TM'
col cluster_time format 999,990 heading 'CLUSTER|WAIT TM'
col concurrency_time format 999,990 heading 'CONCURNCY|WAIT TM'
col network_time format 999,990 heading 'NETWORK|WAIT TM'
col system_io_time format 999,990 heading 'SYS IO|WAIT TM'
col user_io_time format 999,990 heading 'USER IO|WAIT TM'
When run the output is something like this:
SNAP                SNAP       DB SQL EXEC ALL WAIT  BG WAIT   COMMIT  CLUSTER CONCURNCY  NETWORK   SYS IO  USER IO
TIME DURATION TIME TIME TIME TIME WAIT TM WAIT TM WAIT TM WAIT TM WAIT TM WAIT TM
--------------- -------- -------- -------- -------- -------- -------- -------- --------- -------- -------- --------
11/09/13 06:00 3,615 1,594 1,292 894 154 104 61 6 482 52 114
11/09/13 07:00 3,591 1,638 1,352 907 190 71 38 8 491 98 119
11/09/13 08:00 3,589 2,149 1,869 1,324 188 64 70 8 503 86 518
11/09/13 09:00 3,640 5,516 5,062 3,344 223 83 382 24 582 94 2,081
11/09/13 10:00 3,558 11,193 10,150 6,625 326 264 1,040 146 670 161 4,007
As before, this particular database is relatively inactive before 9am, then becomes more active. Now we can see that the wait time is significant e.g. about 60% of Database Time is actually Wait Time in the hour up to 10am. And we can see that the largest component of this Wait Time is User I/O. We can also see that the next highest Wait Time component is Cluster wait time.

What next? We could either try and drill down into individual wait events, or maybe look at the SQL statements being executed and the waits they experience. More in future blog posts.

关注dbDao.com的新浪微博

扫码关注dbDao.com 微信公众号:

关注dbDao.com的新浪微博

扫码加入微信Oracle小密圈,了解Oracle最新技术下载分享资源

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569