Month: November 2008

Parallel Rollback

I had a user call up and ask to kill a session which was causing him grief and hence I killed it without much thought since it was not a Production systemA few hours later I noticed that SMON was doing parallel txn recovery. This was validated by the …

Parallel Rollback

I had a user call up and ask to kill a session which was causing him grief and hence I killed it without much thought since it was not a Production system

A few hours later I noticed that SMON was doing parallel txn recovery. This was validated by the view
select * from x$ktprxrt;

Unfortunately I have lost the output but it did show that it would take eons

But this was also confirmed from the pstack of smon which included the function ktprbeg which I believe begins parallel rollback. (Snippet below)

[syd0904:oracle:OSTA1]/u01/app/oracle => pstack 11905
11905: ora_smon_OSTA1

0000000100d80868 kturax (fffffffffffffffe, 380017150, b, 380017, 105ebe510, 5) + 928
0000000100e15620 ktprbeg (106502000, 0, 1065033c8, 105400, 1056b5, 1065033c8) + 1a0 ===> Begin parallel rollback
00000001007e9238 ktmmon (ffffffffffffffff, ffffffff7fffdda8, 0, 1056b5000, 1, 106502000) + f58
000000010106e0dc ksbrdp (105f1b000, 38000e, 106505ce0, 105f1b000, 105f1b, 1007e8260) + 39c
00000001024efed8 opirip (106510000, 106400, 106518, 380007000, 106510, 1066a5650) + 338
000000010033f7b4 opidrv (106512a90, 0, 32, 10650f7c8, 32, 0) + 4b4
0000000100339c50 sou2o (ffffffff7ffff3e8, 32, 4, ffffffff7ffff410, 105de9000, 105de9) + 50
00000001002fc00c opimai_real (3, ffffffff7ffff4e8, 0, 0, 247e09c, 14800) + 10c
00000001002fbe38 main (1, ffffffff7ffff5f8, 0, ffffffff7ffff4f0, ffffffff7ffff600, ffffffff79d00140) + 98
00000001002fbd5c _start (0, 0, 0, 0, 0, 0) + 17c
—————– lwp# 2 / thread# 2 ——————–

and also confirmed from the SMON trace file

*** 2008-11-28 12:03:16.828
Parallel Transaction recovery caught exception 30319
Parallel Transaction recovery caught error 30319
*** 2008-11-28 12:07:17.163
Parallel Transaction recovery caught exception 30319
Parallel Transaction recovery caught error 30319

So the first thing I did was to disable parallel recovery because of the issue documented in Metalink
using

SQL> ALTER SYSTEM SET fast_start_parallel_rollback=’FALSE’;

System altered.

IMHO (atleast from past experience) serial recovery is faster than parallel recovery atleast in the case where the transaction causes a lot of index block splits.

After this the row from x$ktprxrt disappeared and the following appeared in the SMON trace file

*** 2008-11-28 12:08:32.763
SMON: parallel recovery restart with degree=0 (!=16)
Parallel Transaction recovery caught exception 30312
Parallel Transaction recovery caught error 30312
*** 2008-11-28 12:08:33.039
SMON: parallel recovery restart with degree=0 (!=16)
Parallel Transaction recovery caught exception 30312
Parallel Transaction recovery caught error 30312

The following views agree on how much time it is going to take to complete the rollback

SQL> select * from x$ktuxe where KTUXECFL=’DEAD’ and KTUXESTA=’ACTIVE’;

ADDR INDX INST_ID KTUXEUSN KTUXESLT KTUXESQN KTUXERDBF KTUXERDBB KTUXESCNB KTUXESCNW KTUXESTA
—————- ———- ———- ———- ———- ———- ———- ———- ———- ———- —————-
KTUXECFL KTUXEUEL KTUXEDDBF KTUXEDDBB KTUXEPUSN KTUXEPSLT KTUXEPSQN KTUXESIZ
———————— ———- ———- ———- ———- ———- ———- ———-
FFFFFFFF79969D78 114 1 1 16 174276 2 36154 1113256061 1021 ACTIVE
DEAD 13 0 0 0 0 0 639787

SQL> select * from GV$FAST_START_TRANSACTIONS ;

INST_ID USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL PID CPUTIME PARENTUSN PARENTSLT
———- ———- ———- ———- —————- ————– ————— ———- ———- ———- ———-
PARENTSEQ XID PXID RCVSERVERS
———- —————- —————- ———-
1 1 16 174276 RECOVERING 122712 762423 6312
000100100002A8C4 0

As you can see the value of
KTUXESIZ matches (UNDOBLOCKSTOTAL – UNDOBLOCKSDONE)

So rollback will complete when KTUXESIZ in x$ktuxe drops down to 0 which looks like alot of time !!!
Dumping the redo confirmed that it was the same transaction which was killed

Surprisingly the value of “rollback changes – undo records applied” in v$sysstat was not increasing during this timeline. I have tested this again (kill a long running job and watch the rollback) and can confirm that the stat does get incremented.

Drop an unmounted ASM diskgroup

We are getting our hands dirty with ASM and will be moving all our databases from RAW devices to ASM in the next couple of months. My worst fear came to be when a Unix Sys Admin used a disk already allocated to ASM for a local filesystem (luckily this did not happen on a production server!).
When ASM realised this is immediatly issued a ALTER DISKGROUP PCASDG1 DISMOUNT FORCE command.
By the time the Unix Sys Admin returned the disk it was unrecognozable by ASM as the disk headers had been overwritten. The diskgroup was unrecoverable and unusable.


SQL> select GROUP_NUMBER, NAME, state from v$asm_diskgroup;
GROUP_NUMBER NAME STATE
------------ --------- ----------
0 PCASDG1 DISMOUNTED
2 PCASDGF MOUNTED
3 PCASDGR MOUNTED
3 rows selected.

SQL>alter diskgroup PCASDG1 mount;
alter diskgroup PCASDG1 mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "19" is missing

SQL>drop diskgroup PCASDG1 including contents;
drop diskgroup PCASDG1 including contents
*
ERROR at line 1:
ORA-15039: diskgroup not dropped
ORA-15001: diskgroup "PCASDG1" does not exist or is not mounted

SQL>alter diskgroup PCASDG1 check;
alter diskgroup PCASDG1 check
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15001: diskgroup "PCASDG1" does not exist or is not mounted

I had to recreate the whole diskgroup from scratch. I followed Metalink Note 387103.1 to recreate the diskgroup.
Basically you clean up the asm disk header of all the disks in the diskgroup. This will also remove the whole diskgroup from asm.


SQL> select 'dd if=/dev/zero of='PATH' bs=8192 count=100' from v$asm_disk where GROUP_NUMBER=0;

$dd if=/dev/zero of=/dev/oracle/dg/c11t15d2 bs=8192 count=100
100+0 records in
100+0 records out

$dd if=/dev/zero of=/dev/oracle/dg/c11t7d0 bs=8192 count=100
100+0 records in
100+0 records out
etc ...

SQL> select GROUP_NUMBER, NAME, state from v$asm_diskgroup;
GROUP_NUMBER NAME STATE
------------ --------- ----------
2 PCASDGF MOUNTED
3 PCASDGR MOUNTED
2 rows selected.

The diskgroup is now removed. I then used DBCA to recreate the ASM diskgroup, all these disks showed up as Candidate disks.

Athens (Greece)

I am very conscious that I have not posted much on this blog recently. In part the technical stuff has been surfaced through the company blog – I get in to trouble if I say Mark’s blog because it is the company one now, an in part because of a load of domestic events have […]

Oracle Data Guard Presentation

I had the privilege to present at the SAOUG (South African Oracle User Group) in Sun City last month. The conference was well organized, the location was great and we had close to 1000 attendees. I presented on Oracle Data Guard.

Thanks to Karen Morton (one of my favorite bloggers) for making me aware of SlideShare. To access my presentation, just click on the slideshare image on the right side and it will take you to my presentation on slideshare. Happy viewing!

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569