Hi,
Today will discuss one of the scenario of cloud migration that we have encountered related to network. This is related to migration of one of the oracle database of size ~12.5 TB. One of our strategy is build DR environment on AWS using native Oracle RMAN tool and use that as read only database for power PI reporting as well as to run untested queries for users. Idea is use LGWR asynchronous setup using Max Performance mode of Oracle Configuration of data guard. Get the benefit of data protection along with read replicas for reporting and adhoc untested queries. Our level 2 DBA kickstarted using the KBA of DR build on target EC2. After some time, incidents started flowing. We saw over 50% packets loss.
RMAN copy flooded one the network device on the path causing issues for all the servers on that were communicating through that router. Solution that we used is configured secondary NIC to take the traffic entirely different route to the target host - MOS note 960510.1.
You can in the picture mtr command is giving packet losses happening live, copy was not progressing well either. Here are possible ways theoretically to avoid this situation.
1. Use Snow family devices based on the size to avoid initial bulk copy of Level 0 backup, Level 1 RMAN incremental can be over DX connection
2. Reduce the number of RMAN backup channels
3. Use RMAN HIGH,Low,Medium Compression options (You need Advanced Compression Licenses) On top of enterprise license of Oracle Software to reduce the amount of TB's that get transferred over network. You have to account for CPU cycles it takes for compression. If you have more CPU cycles free, you can go for High Level compression (https://docs.oracle.com/en/database/oracle/oracle-database/19/bradv/configuring-rman-client-advanced.html#GUID-3117DA93-EC34-488D-A4FB-29E6CD4D168A)
4. Move Data Guard network traffic to a separate network interface(Data Guard Transport Considerations on Oracle Database Machine (Exadata) (Doc ID 960510.1))
5. Limit I/O bandwidth consumption Using RMAN advanced parameter (Use the RATE channel parameter to act as a throttling mechanism for backups)
6. How about working with Network Team to see sort out something that can be done on the network devices, wondershaper/trickle/tc (https://www.baeldung.com/linux/throttle-bandwidth)
6. Any other good solution, write it in your comments.
No comments:
Post a Comment