DR site CRS failure

Hi Everyone!!

Today my SA's did OS patching on our 11i DR site. Our DR site is in Sun OS consisting 4 node RAC runs only recovery on Node1. As a DBA I have canceled the recovery and brought down the database on node1 and CRS on all the nodes.

On Node1, I have cancelled the recovery

bash-3.00$ sqlplus '/as sysdba'

SQL*Plus: Release 10.2.0.3.0 - Production on Fri May 22 15:06:31 2009

Copyright (c) 1982, 2006, Oracle. All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP
and Data Mining options

DRPRD> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL

Database altered.

DRPRD>

DRPRD> show parameters uniqu

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------
db_unique_name string DRPRD


DRPRD> shut immediate
ORA-01109: database not open

Database dismounted.

ORACLE instance shut down.

DRPRD>


Stopped the CRS in all the four nodes

bash-3.00# crsctl stop crs Stopping resources.
This could take several minutes.
Successfully stopped CRS resources.
Stopping CSSD.

Shutting down CSS daemon.

Shutdown request successfully issued.

bash-3.00#


Disabled the CRS in all the four nodes from automatic startup on reboot.

bash-3.00# crsctl disable crs
bash-3.00#


Got a call from Peter, SA to bring up the services. When I brought up CRS on all the nodes, I found something wrong with 4th node. only one process was up after crsctl start crs command.

bash-3.00# crsctl start crs
Attempting to start CRS stack

The CRS stack will be started shortly

bash-3.00#

$ ps -ef|grep crs root 3590 1 0 13:47:18 console 0:00 /bin/sh /etc/init.d/init.crsd run oracle 6517 5907 0 17:10:22 pts/5 0:00 grep crs
$



whereas Node 1,2,3 were giving 12

bash-3.00$ ps -ef|grep crs|wc
12 110 1267

bash-3.00$


That indicated me something wrong with node 4.
chcked /var/adm/messages file

found that

May 22 14:03:06 prddrnoda SC[SUNW.qfs:4.6,qfsvoting-rg,qfsvoting-rs,scqfs_boot]: [ID 990461 daemon.notice] FS votingdisk: devices not available; waiting.
May 22 14:03:06 prddrnoda SC[SUNW.qfs:4.6,qfsvoting-rg,qfsvoting-rs,scqfs_boot]: [ID 990461 daemon.notice] FS votingdisk2: devices not available; waiting.
May 22 14:03:21 prddrnoda SC[SUNW.qfs:4.6,qfsvoting-rg,qfsvoting-rs,scqfs_boot]: [ID 990461 daemon.notice] FS votingdisk1: devices not available; waiting.
May 22 14:03:21 prddrnoda SC[SUNW.qfs:4.6,qfsvoting-rg,qfsvoting-rs,scqfs_boot]: [ID 990461 daemon.notice] FS votingdisk: devices not available; waiting.
May 22 14:03:21 prddrnoda SC[SUNW.qfs:4.6,qfsvoting-rg,qfsvoting-rs,scqfs_boot]: [ID 990461 daemon.notice] FS votingdisk2: devices not available; waiting.
May 22 14:03:36 prddrnoda SC[SUNW.qfs:4.6,qfsvoting-rg,qfsvoting-rs,scqfs_boot]: [ID 990461 daemon.notice] FS votingdisk1: devices not available; waiting.
May 22 14:03:36 prddrnoda SC[SUNW.qfs:4.6,qfsvoting-rg,qfsvoting-rs,scqfs_boot]: [ID 990461 daemon.notice] FS votingdisk: devices not available; waiting.
May 22 14:03:36 prddrnoda SC[SUNW.qfs:4.6,qfsvoting-rg,qfsvoting-rs,scqfs_boot]: [ID 990461 daemon.notice] FS votingdisk2: devices not available; waiting.
May 22 14:03:51 prddrnoda SC[SUNW.qfs:4.6,qfsvoting-rg,qfsvoting-rs,scqfs_boot]: [ID 990461 daemon.notice] FS votingdisk1: devices not available; waiting.
May 22 14:03:51 prddrnoda SC[SUNW.qfs:4.6,qfsvoting-rg,qfsvoting-rs,scqfs_boot]: [ID 990461 daemon.notice] FS votingdisk: devices not available; waiting.
May 22 14:03:51 prddrnoda SC[SUNW.qfs:4.6,qfsvoting-rg,qfsvoting-rs,scqfs_boot]: [ID 990461 daemon.notice] FS votingdisk2: devices not available; waiting.

df command was showing none of the file systems got mounted.
Called back Peter to remount the file system. Once he has done the filesystem mount was able to start the CRS

bash-3.00# crsctl start crs
Attempting to start CRS stack

The CRS stack will be started shortly

bash-3.00#

bash-3.00$ ps -ef|grep crs|wc -- looks good now
12 110 1267

bash-3.00$

started DB

bash-3.00$ sqlplus '/as sysdba'

SQL*Plus: Release 10.2.0.3.0 - Production on Fri May 22 15:06:31 2009

Copyright (c) 1982, 2006, Oracle. All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP
and Data Mining options

DRPRD>
DRPRD>startup mount
ORACLE instance started.

Total System Global Area 2.1475E+10 bytes
Fixed Size 2215984 bytes
Variable Size 2117677008 bytes
Database Buffers 1.9344E+10 bytes
Redo Buffers 10813440 bytes
Database mounted.
DRPRD> exit

--started recovery -

DRPRD> alter database recover managed standby database using current logfile disconnect;

Database altered.

DRPRD>

Will discuss more on DR.

3 comments:

Suresh Lakshmanan said...

You can use
'alter database register logfile '/u01/Logxxxx.arc'; ' to register the logfile in case it got not updated in data dictionary.

Anonymous said...

Hi Suresh,

Could you post Oracle 10g RAC installation in Windows, and datagurd installtion in Windows, and connect/config RAC, and DG. Also how to install 11i with RAC, DG.?

Thanks,

Jo

Suresh Lakshmanan said...

Hi Jo,

I do not have one for windows. I do have for unix falvors. I will post for unix.

thnx
Suresh