Here is interesting one on RAC.
Constantly we were getting "GC Blocks Lost and Corrupt" from Enterprise Manager grid control.
SQL> SELECT
GVS1.INST_ID "instance",
GVS1.VALUE "blocks lost",
GVS2.VALUE "blocks corrupt"
FROM GV$SYSSTAT GVS1,
GV$SYSSTAT GVS2
WHERE GVS1.NAME = 'gc blocks lost'
AND GVS2.NAME = 'gc blocks corrupt'
AND GVS1.INST_ID = GVS2.INST_ID;
SQL>
GVS1.INST_ID "instance",
GVS1.VALUE "blocks lost",
GVS2.VALUE "blocks corrupt"
FROM GV$SYSSTAT GVS1,
GV$SYSSTAT GVS2
WHERE GVS1.NAME = 'gc blocks lost'
AND GVS2.NAME = 'gc blocks corrupt'
AND GVS1.INST_ID = GVS2.INST_ID;
instance blocks lost blocks corrupt
---------- ----------- --------------
2 574 2
1 132 0
4 558 2
3 1388 0
SQL>
After a considerable research we decided to tune the under configured UDP protocol network settings. We use UDP as a protocol. How to find which protocol being used? Check alert log.
Thu Jun 26 20:01:43 2008
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
PMON started with pid=2, OS id=8865
How Oracle Instances communicate?
In Under tuned network settings, Oracle has to break up the data blocks into smaller pieces to send them across the interconnect. Oracle uses UDP to do this and a problem with UDP is that it does not contain ordering information about how to reassemble the pieces back into an Oracle data block. Oracle send what is called "side band" messages containing information on how to reconstruct them. All of these pieces need to arrive in a timely fashion. If there is a problem, all of the pieces and the the "side band" messages are sent one again.
"gc cr block congested" and other "gc" waits on our environment suggested that the LMS and/or other processes that send the blocks across the interconnect are too busy or do not get the CPU in a timely fashion to perform the necessary work. We thought, making adjustments to the parameters relating to send buffer space, receive buffer space, send highwater, and receive highwater (UDP_XMIT_HIWAT, UDP_RECV_HIWAT and UDP_MAX_BUF) may help resolving the problem as we saw high volumes of traffic across the interconnect.
We had UDP_XMIT_HIWAT and UDP_RECV_HIWAT were set at very low value than 65536 which is oracle recommended default value. We have made the changes to 65536. Issue is not seen at this point!!!
UDP settings:
To set the UDP highwater mark settings and max buffer settings we have used the commands
All pertaining to Sun OS
ndd -set /dev/udp udp_xmit_hiwat 65536
ndd -set /dev/udp udp_recv_hiwat 65536
ndd -set /dev/udp udp_max_buf 2097152
To look the current value of UDP parameters we used the commands
bash-3.00$ ndd /dev/udp udp_xmit_hiwat
65536
bash-3.00$ ndd /dev/udp udp_recv_hiwat
65536
bash-3.00$ ndd /dev/udp udp_max_buf
2097152
bash-3.00$
Reference metalink note 181489.1
This issue was pretty interesting and we have the environment under control now.
Happy Learning!!
4 comments:
What was the OS and the version of OS?
What is the OS and the version of OS
Its Sun Solaris 10.
some more interesting details are in 563566.1
Post a Comment