Exodus Exadata to cloud

We were plugging out one DB after another from Exadata Platform and converting them into VM based single instance DB's in private clouds as well as into public cloud environments. One of the challenge is whether the target platform will be performing as good as Exadata? Of course, this is bit tough as Exadata is purpose built machine, designed to run only one thing database. With all the redundancy built internally and storage cell offloading capabilities, it is little touch to match, however for our workload we could do it comparable performing migrations with some challenges on our mission critical production 24/7 critical databases and handful of medium workload databases.

Reason we were looking to move out of Exadata is, it is physical, we had to refresh the hardware, server maintenance, our road map was to move into cloud platforms where we get much of the agility of spinning and scaling on the fly easily.  Working through procurement, sourcing, licensing all hindered our velocity of our business discoveries. Lot of manual DBA work hours in building data guard environments, backups etc, Encryption setup on premises, with cloud version, we could achieve all in operationally repeatable, consistent builds, monitoring with CI/CD pipeline terraform code, and reduced cost as we had over provisioned initially, no way to scale down after physical deployment. Exadata platform was giving best performance for the past decade for us with auto SR creation, platinum patching etc. No RAC option on Any public clouds other than OCI. what?? Oracle doesn't support RAC on third party public clouds? hmm. (Oracle Database Support for Non-Oracle Public Cloud Environments (Doc ID 2688277.1))

we had resource segregation on Exadata with cpu_count parameters, that gave us easy way to find target instances m class with matching memory settings of comparable memory_target/sga,pga sizes. Only place we had to do some depth analysis was on IO characteristics.  If you have AWR diagnostics pack, you can calculate with history stats using the queries listed on the blog.  Bunch of RMAN Backup IOs can be eliminated with snapshot backups. PIOPS is costly, carefully choosing the numbers is key, though you can dynamically adjust on the fly.  We had to multiply the memory parameter settings of single instance by number of nodes, similarly processes and sessions parameters. we had to do some careful reduction on parallel query parameters.

IO subsystem testing tools  (orion, winsat,iometer), RDBMS Level testing tools(SLOB,DBMS_RESOURCE_MANAGER.CALIBRATE_IO, DBMS_STATS.GATHER_SYSTEM_STAT), Application Level testing tools (DB Side) (Hammerora,Swingbench, Simora, Oracle RAT), Application Level testing tools (Apps Side)(LoadRunner,LoadNinja,Apache JMeter) can give the confidence of the IOPS. AWS does what it advertises in terms of IO. 

If you have provisioned x number of PIOPS, these IO testing tools will give exactly closer the same. We went with closer to max IOPS required, reduced down after tuning high IO consuming queries. Keep closer look at AWR reports on high IO consumers. If you are hosting on EC2s keep online redo logs in separate volumes to avoid any IO queuing that gives freeze effect on queries if you ever max out the quota.

And Oracle IO is not equivalent of AWS IO because of the way AWS count read and writes different way that what Oracle IO requirements are.


Read the article  by Yavor Ivanov on how to interpret the query output. 
select snap_id, snap_time, 
       round(EXA_physical_IOPS_total / snap_seconds) Exadata_IOPS,
       round(AWS_physical_IOPS_total / snap_seconds) AWS_IOPS,
       round(MB_read_write / snap_seconds) MB_per_second,
       round(EXA_Mbytes_saved / snap_seconds) MB_saved_per_second,
       round(EXA_Mbytes_eligible / snap_seconds) MB_eligible_per_second
 from (
      SELECT snap_id,
             min(snap_time) snap_time,
             AVG(ROUND(snap_interval * 1440)) snap_minutes, 
             AVG(ROUND(snap_interval * 1440 * 60)) snap_seconds, 
             -- total single block reads - storage independent
             sum(delta_read_singlebl_requests+delta_write_singlebl_requests) single_block_IOPS_total,
             -- multiblock reads as reported by Exadata       
             sum(delta_read_multibl_requests+delta_write_multibl_requests) EXA_multiblock_IOPS_total,
             -- multiblock reads as expected on AWS (256K read instead of 1M)
             sum(delta_read_multibl_requests+delta_write_multibl_requests)*4 AWS_multiblock_IOPS_total,
             -- total IOPS as reported by Exadata 
             sum(delta_read_io_requests+delta_write_io_requests) EXA_physical_IOPS_total,       
             -- Total IOPS expected on AWS - exadata single block IOPS plus (4 * exadata multiblock IOPS) as each 1 MB exadata IOP will turn int 4 x 256K IOP on AWS
             sum(delta_read_multibl_requests+delta_write_multibl_requests)*4+sum(delta_read_singlebl_requests+delta_write_singlebl_requests) AWS_physical_IOPS_total,
             --  raw bandwidth in MB
             round(sum(delta_read_bytes+delta_write_bytes)/1024/1024) MB_read_write,
             -- bytes saved by stirage index usage on exadata             
             round(sum(delta_eligible_bytes)/1024/1024) EXA_Mbytes_eligible,
             round(sum(delta_bytes_saved)/1024/1024) EXA_Mbytes_saved             
        FROM (select s.snap_id, tm.INSTANCE_NUMBER,
                     trunc(begin_interval_time, 'MI') snap_time,
                     cast(end_interval_time AS date) - cast(begin_interval_time AS DATE) snap_interval,
                     total_read_io_requests - lag(total_read_io_requests, 1) over(order by tm.instance_number, tm.snap_id) delta_read_io_requests,
                     total_write_io_requests - lag(total_write_io_requests, 1) over(order by tm.instance_number, tm.snap_id) delta_write_io_requests,
                     total_read_multibl_requests - lag(total_read_multibl_requests, 1) over(order by tm.instance_number, tm.snap_id) delta_read_multibl_requests,
                     total_write_multibl_requests - lag(total_write_multibl_requests, 1) over(order by tm.instance_number, tm.snap_id) delta_write_multibl_requests,
                     total_read_singlebl_requests - lag(total_read_singlebl_requests, 1) over(order by tm.instance_number, tm.snap_id) delta_read_singlebl_requests,
                     total_write_singlebl_requests - lag(total_write_singlebl_requests, 1) over(order by tm.instance_number, tm.snap_id) delta_write_singlebl_requests,
                     total_read_bytes - lag(total_read_bytes, 1) over(order by tm.instance_number, tm.snap_id) delta_read_bytes,
                     total_write_bytes - lag(total_write_bytes, 1) over(order by tm.instance_number, tm.snap_id) delta_write_bytes,
                     total_eligible_bytes - lag(total_eligible_bytes, 1) over(order by tm.instance_number, tm.snap_id) delta_eligible_bytes,
                     total_IO_saved - lag(total_IO_saved, 1) over(order by tm.instance_number, tm.snap_id) delta_bytes_saved
                from (select snap_id,t.INSTANCE_NUMBER,
                             sum(decode(stat_name,'physical read total IO requests',value,0)) total_read_io_requests,
                             sum(decode(stat_name,'physical write total IO requests',value,0)) total_write_io_requests,
                             sum(decode(stat_name,'physical read total multi block requests',value,0)) total_read_multibl_requests,
                             sum(decode(stat_name,'physical write total multi block requests',value,0)) total_write_multibl_requests,
                             sum(decode(stat_name,'physical read total IO requests',value,'physical read total multi block requests',-1 * value,0)) total_read_singlebl_requests,
                             sum(decode(stat_name,'physical write total IO requests',value,'physical write total multi block requests',-1 * value,0)) total_write_singlebl_requests,
                             sum(decode(stat_name,'cell physical IO bytes saved by storage index',value,0)) total_IO_saved,                             
                             sum(decode(stat_name,'cell physical IO bytes eligible for predicate offload',value,0)) total_eligible_bytes,
                             sum(decode(stat_name,'physical read total bytes',value,0)) total_read_bytes,
                             sum(decode(stat_name,'physical write total bytes',value,0)) total_write_bytes
                        from DBA_HIST_SYSSTAT t
                       group by snap_id, t.INSTANCE_NUMBER) tm,
                     dba_hist_snapshot s
               where s.snap_id = tm.snap_id
                 and s.instance_number = tm.instance_number
                 and s.begin_interval_time > trunc(sysdate) - 31
                 and s.begin_interval_time < trunc(sysdate)
                 )
       GROUP BY snap_id
  )
       order by 2;

No comments: