Production Performance Troubleshooting - log file sync wait

 Hi Everyone!

We have got complaints production database performance issues after we exited Exadata. Users reported freezing effects on chat application what faces clients directly. we knew this application running with us for years. Here and there noises were there even before with other issues - unknown history as I have never been into this application space before. Support channel was flooded with messages at 9:17AM from users saying application froze. this is running on old version - 11.2.0.4 DB.

Correspondingly OEM performance graph was showing high wait event on commit. 'log file sync' same observed AWR reports as well. 


I was doing some analysis on it. I have observed log file switches happened so frequently. this is unusual as well. I was thinking what is causing more redo writes into the system.


-rw-r-----   1 oracle oinstall 2564783616 Oct 17 09:06 1_41609_1038620004.dbf

-rw-r-----   1 oracle oinstall 2582293504 Oct 17 09:11 1_41610_1038620004.dbf

-rw-r-----   1 oracle oinstall 2588420096 Oct 17 09:12 1_41611_1038620004.dbf

-rw-r-----   1 oracle oinstall 2555900416 Oct 17 09:13 1_41612_1038620004.dbf

-rw-r-----   1 oracle oinstall 2617330688 Oct 17 09:15 1_41613_1038620004.dbf

-rw-r-----   1 oracle oinstall 2624246784 Oct 17 09:16 1_41614_1038620004.dbf

-rw-r-----   1 oracle oinstall 2624150016 Oct 17 09:17 1_41615_1038620004.dbf

-rw-r-----   1 oracle oinstall 2628691456 Oct 17 09:19 1_41616_1038620004.dbf

-rw-r-----   1 oracle oinstall 2601312256 Oct 17 09:20 1_41617_1038620004.dbf

-rw-r-----   1 oracle oinstall 2563410944 Oct 17 09:21 1_41618_1038620004.dbf

-rw-r-----   1 oracle oinstall 2593818624 Oct 17 09:24 1_41619_1038620004.dbf

-rw-r-----   1 oracle oinstall 2605819904 Oct 17 09:27 1_41620_1038620004.dbf

-rw-r-----   1 oracle oinstall 2598959616 Oct 17 09:28 1_41621_1038620004.dbf

-rw-r-----   1 oracle oinstall 2559222272 Oct 17 09:38 1_41622_1038620004.dbf

-rw-r-----   1 oracle oinstall 2557649920 Oct 17 09:48 1_41623_1038620004.dbf

-rw-r-----   1 oracle oinstall 2562846720 Oct 17 10:00 1_41624_1038620004.dbf


Then decided to do Log mining. 

BEGIN

DBMS_LOGMNR.ADD_LOGFILE (

LogFileName => '/flash/1_41609_1038620004.dbf',

options => DBMS_LOGMNR.NEW

);

END;

/

BEGIN

DBMS_LOGMNR.ADD_LOGFILE (

LogFileName => '/flash/1_41610_1038620004.dbf',

options => DBMS_LOGMNR.ADDFILE

);

END;

/

BEGIN

DBMS_LOGMNR.ADD_LOGFILE (

LogFileName => '/flash/1_41611_1038620004.dbf',

options => DBMS_LOGMNR.ADDFILE

);

END;

/

Added other files as well, truncated for the article. 


alter session set nls_date_format= 'DD-MON-YYYY HH24:MI:SS';

BEGIN

 -- Specifiy time range

  DBMS_LOGMNR.start_logmnr (

    starttime    => TO_DATE('17-OCT-2022 09:17:00', 'DD-MON-YYYY HH24:MI:SS'),

    endtime      => TO_DATE('17-OCT-2022 09:18:00', 'DD-MON-YYYY HH24:MI:SS'));

END;

/

SELECT  operation,seg_name,count(1)

FROM   v$logmnr_contents group by operation,seg_name order by 3 desc



select object_name from dba_objects where object_id='618131806';

H_TASK_AGE_FACT_15MIN -- culprit object that was doing 10 million rows delete operation in 2 mins interval and 56 mins for ~20mins period. 

Asked the app team to reschedule the job that was doing this bulk operation during active hours of the business.  Finally ended the mining session with below SQL.  

BEGIN

  DBMS_LOGMNR.end_logmnr;

END;

/   

There are several ways to solve the problem. One way to is remove the causing factor. faster IO would resolve this as well, as it is out of our control we couldn't get to that route. Once cause was identified I was able to split the analytical workload outside the critical business hours. As a long term plan, will be moving that schema outside OLTP, its a flaw by design to keep analytical and OLTP workloads together. 


No comments: