Managed Redo Apply using all CPU

Again and again I’m forced to setup Data Guard on Windows platforms. Windows is not my favorite platform but that does not matter sometimes. On the other hand, the Oracle command line tools are identical on every platform.
So this is the inital setup that we have:

  • 2 Servers, 2x 10 Core CPU and sufficient RAM with some TB local disks
  • servers in separate compute centers
  • Windows 2012 R2
  • Oracle Database Enterprise Edition 12.1.0.2.160119 non-CDB

Pretty obvious, that Data Guard would be a good option to decrease the unplanned outages for databases on those servers. Everything from installation to creating databases and setting up Data Guard went fine. Until we first tested a switchover. Sounds simple and in fact it is. At least in terms of syntax. But after the switchover the new standby database server was heavily loaded and did hardly respond to anything we were trying. After a while, 30 minutes or so, it was all over and the standby server was working fine again with no load at all. At this point in time there was no data nor any application load at all in the database. Just an freshly created empty database.
We then did another switchover to make the standby database primary again. And again, the new standby server was heavily loaded for quite some time. When investigating this behaviour we found the following in the alert.log:

2016-04-18 13:29:47.131000 +02:00
Started logmerger process
Managed Standby Recovery starting Real Time Apply
Parallel Media Recovery started with 40 slaves

The automatic parallel recovery which is enabled by default decided to use 40 processes. That makes sense when we look at the hardware specification. But my assumption was, that there must be something wrong with that. I had no proof that the recovery slaves where causing the high CPU load since the system was so unresponsive. We simply modified the Data Guard configuration to reduce the parallel degree.

DGMGRL> edit database testdb_a set property ApplyParallel = 8;
DGMGRL> edit database testdb_b set property ApplyParallel = 8;

Again we tried another switchover and monitored the new standby server. This time it was not loaded at all. I still don’t know why the 40 processes where causing such a heavy load, but for the time being I am happy with what we have now.

So be careful when setting up Data Guard. Do tests and monitor the systems. And if you ever experience heavy load when there should be no load at all, remember my posing.

Advertisements