Month: January 2009

Stability is your friend

Oracle and other Microsoft are putting more and more automatic and self – everything features into their database. There are of course many reasons why that makes sense (for Oracle and Microsoft), but does it make sense for all Oracle Systems and their DBAs? I don’t think so. Consider this: All these automatic and self-tuning […]

Outbound_connect_timeout

Outbound_connect_timeout comes into play when nodes in a cluster are down and we cannot wait for the OS timeout as this causes long delays in connect time. For example on Solaris the value of tcp_ip_abort_interval = 180000 ==> which is 180 seconds =…

Outbound_connect_timeout

Outbound_connect_timeout comes into play when nodes in a cluster are down and we cannot wait for the OS timeout as this causes long delays in connect time. For example on Solaris the value of tcp_ip_abort_interval = 180000 ==> which is 180 seconds ==> 3 mins

In this post I will demonstrate how outbound_connect_timeout (OCT) can effectively avoid timeouts experienced by clients connecting to RAC nodes

If we take an example of the following connect string

TEST =
(DESCRIPTION =
(ADDRESS_LIST =
(LOAD_BALANCE = OFF)
(ADDRESS = (PROTOCOL = TCP)(HOST = sdby1-vip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = sdby2-vip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = sdby3-vip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = sdby4-vip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = prim1-vip)(PORT = 1521))
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = TEST)
)
)

In the above alias the first 4 nodes are non existent nodes and the service TEST
runs on the node prim1 which is listed last in the above alias.

I have deliberately set load_balance to OFF so that the client has to traverse through all nodes serially.

If we set an OUTBOUND_CONNECT_TIMEOUT of 3 seconds in the client’s sqlnet.ora
(client is 10.2.0.3 Solaris) then the time to establish the connection is around 12 seconds.

If we were to run a sqlnet trace on the connection we see that the connection starts at

Connection started at
[27-JAN-2009 22:52:33:741] — TRACE CONFIGURATION INFORMATION FOLLOWS —
[27-JAN-2009 22:52:33:741] New trace stream is /tmp/cli_262.trc

and the first address which is tried is

[27-JAN-2009 22:52:33:757] nsc2addr: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=sdb1-vip)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=TEST)(CID=(PROGRAM=sqlplus@bart)(HOST=bart)(USER=oracle))))

Moving on we can see that OCT is enabled due to the line

[27-JAN-2009 22:52:33:759] nstoSetupTimeout: entry
[27-JAN-2009 22:52:33:759] nstoSetupTimeout: ATO enabled for ctx=0x1001c9280, val=3000(millisecs)

After this we the following pattern is seen continuously

[27-JAN-2009 22:52:33:776] nsevwtsg: entry
[27-JAN-2009 22:52:33:776] nsevwtsg: cid=0
[27-JAN-2009 22:52:33:776] nsevwait: entry
[27-JAN-2009 22:52:33:776] nsevwait: 1 registered connection(s)
[27-JAN-2009 22:52:33:776] nsevwait: 0 pre-posted event(s)
[27-JAN-2009 22:52:33:776] nsevwait: waiting for transport event (0 thru 0)…
[27-JAN-2009 22:52:33:776] nsevwait: 0 newly-posted event(s)
[27-JAN-2009 22:52:33:776] nsevwait: 0 posted event(s)
[27-JAN-2009 22:52:33:776] nsevwait: exit (0)
[27-JAN-2009 22:52:33:776] nstoToqCheckSingle: entry
[27-JAN-2009 22:52:33:776] nstoToqCheckSingle: normal exit

and the timeout occurs in 3 seconds

[27-JAN-2009 22:52:36:771] nstoHandleEventTO: ATO occurred for ctx=0x1001c9280

Ergo the OCT seems to be working perfectly

The client then tries to establish a connection with sdby2-vip and experiences
the same timeout

The connection is finally established at

[27-JAN-2009 22:52:45:915] nscon: no connect data
[27-JAN-2009 22:52:45:915] nscon: connect handshake is complete
[27-JAN-2009 22:52:45:915] nscon: nsctxinf[0]=0x41, [1]=0x41
[27-JAN-2009 22:52:45:915] nscon: normal exit

which translates to around 12 seconds.

From my testing for clients on Windows

OUTBOUND_CONNECT_TIMEOUT does not work with 10.2.0.3 base windows client. (Clients may get errors)
It works with patch 21 on top of 10.2.0.3
OUTBOUND_CONNECT_TIMEOUT does work with 11g client

Without OCT if we trace the connection again we see that we wait for more than 3 mins

[28-JAN-2009 14:18:35:299] nttcni: entry
[28-JAN-2009 14:18:35:299] nttcni: trying to connect to socket 10.
[28-JAN-2009 14:22:19:915] ntt2err: entry

[28-JAN-2009 14:22:19:915] ntt2err: soc 10 error – operation=1, ntresnt[0]=505, ntresnt[1]=145, ntresnt[2]=0
[28-JAN-2009 14:22:19:915] ntt2err: exit
[28-JAN-2009 14:22:19:915] nttcni: exit
[28-JAN-2009 14:22:19:915] nttcon: exit
[28-JAN-2009 14:22:19:921] nserror: entry
[28-JAN-2009 14:22:19:921] nserror: nsres: id=0, op=65, ns=12535, ns2=12560; nt[0]=505, nt[1]=145, nt[2]=0; ora[0]=0, ora[1]=0, ora[2]=0
[28-JAN-2009 14:22:19:921] nsopen: unable to open transport
[28-JAN-2009 14:22:19:921] nsiocancel: entry

Data Guard Lag Time

We use the Broker to administer our Data Guard configurations and we also use OEM to monitor our databases, we have found that every time we change the state of the Physical Standby between ONLINE and READ-ONLY (and back to ONLINE again) we start getting OEM alarms for reaching our set limits for Lag Time (the time in seconds that the Physical Standby is behind the Primary database).This Lag Time can also be found in v$dataguard_stats (in fact that is where OEM gets it from).

What we have found was that we need to bounce the Physical Standby to force v$dataguard_stats to get updated with the correct values. The good news is that the Physical standby does not really have a lag time, it is just that v$dataguard_stats does not update itself when DG STATES change. The bad news is that OEM shows the faulty lag time, generates an alarm and since our alarms are visible to the whole organization we have to explain that it is not really a problem.

To confirm that the problem is in v$dataguard_stats and is not a real reflection of reality, do the following:


On Standby:

SQL>select TIME_COMPUTED from v$dataguard_stats;

TIME_COMPUTED
------------------------------
09-JAN-2009 09:45:13
09-JAN-2009 09:45:13
09-JAN-2009 09:45:13
09-JAN-2009 09:45:13
09-JAN-2009 09:45:13

SQL>select current_scn from v$database;

CURRENT_SCN
--------------------------
9661858803219

On Primary:

SQL>select scn_to_timestamp(9661857384219) from dual;

SCN_TO_TIMESTAMP(9661857384219)
---------------------------------------------------------
12-JAN-09 12.26.57.000000000 PM

SQL>select SCN_TO_TIMESTAMP(current_scn) from v$database;

SCN_TO_TIMESTAMP(CURRENT_SCN)
---------------------------------------------------------
12-JAN-09 12.30.00.000000000 PM

So we are only a few seconds behind, but v$dataguard_stats and OEM show otherwise. Bouncing the Physical Standby database fixed this.

Cleaning up with ALL_TAB_COLS

The way he barely lifted his feet off the ground when he walked, I could hear him approaching from the other end of the hall. I knew he was coming, and I dreaded it. I had just gotten off the phone with a former teammate who was moved to another department. They had a new database analyst who was struggling with his new "database independent" application, and he was sent my way. It was just

KeePass and Putty

KeePass and Putty are software programs that work great together, especially in an enterprise where you have a lot of passwords to remember. Autotmatic Login to a Unix server is simply a “Ctrl+U” away.

First you need to setup your Putty Configuration, for example:

Servers are saved as sessions and Auto-login username is specified

In KeepPass you need to configure 1 entry per server. The URL entry is what makes KeePass more than a list of passwords.

If putty is in your $PATH then the command you typed in the URL field above (putty -load -pw XX) will also work from the command line. In the main KeepPass window you simply need to highlight the entry and press “Ctrl-U”.

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569