Grid Infrastructure Upgrade to 12.2 requires reboot

Last week I had to upgrade a 2-node Oracle cluster running Grid Infrastructure 12.1.0.2 with the April 2017 Bundlepatch on Oracle Linux 7. The interresting thing is, the cluster is using the ASM Filter Driver (AFD) to present the disks to the GI. Since there were some caveats, I will walk you through the steps that lead to a running 12.2 cluster. Unfortunately, I have no terminal output or screenshots, but I am sure you will get the idea.

First, we updated the nodes OS-wise. So at the end, we have OL 7.4 with the latest UEK kernel at the time of patching the node. That went smooth.

Second, we installed the new Grid Infrastructure 12.2 software. To do that, we extracted the ZIP to it’s final location as described in the documentation. Then we run “gridSetup.sh” from this location and chose “Software only” and selected both nodes. This prepares the Oracle Homes on both nodes but does nothing else to it.

Next step was to patch the GI software to the latest (170118) bundlepatch. This is generally a good idea to fix as much issues as possible before setting up the cluster. It provides newer versions of kernel modules which is important in our case since we updated the kernel as the first step. But since we do not have a running 12.2 cluster at the time of patching, we cannot use “opatchauto” functionality to apply the patch. Instead, we needed to update OPatch to the latest version on both nodes and then apply all the patches that comes with the GU bundlepatch one by one like this:

oracle$ export ORACLE_HOME=/u01/app/grid/12.2.0.1
oracle$ export PATH=$ORACLE_HOME/OPatch:$PATH
oracle$ cd /tmp/27100009
oracle$ cd 26839277
oracle$ opatch apply .
oracle$ cd ../27105253
oracle$ opatch apply .
oracle$ cd ../27128906
oracle$ opatch apply .
oracle$ cd ../27144050
oracle$ opatch apply .
oracle$ cd ../27335416
oracle$ opatch apply .

Note, that this was run as the owner of the GI home, “oracle” in our case.

Before running the upgrade, we need to check if there is sufficient space available for the GIMR. Unfortunately the upgrade process creates the new GIMR in the same diskgroup that is used for storing OCR and voting disk even if the GIMR is currently stored in another diskgroup. In contrast, the installation can use a separate diskgroup for GIMR. So be aware of that.

At this point we can start the upgrade process by running “gridSetup.sh” again and selecting the “Upgrade” option. Quickly we come to the point where to root scripts needs to run. That is where the fun starts. In our case the “rootupgrade.sh” failed at the step where the AFD driver is being updated.

CLSRSC-400: A system reboot is required to continue installing.

The reason for that is, the “oracleafd” kernel module is in use and thus cannot be unloaded.

[root ~]# lsmod |grep afd
oracleafd             204227  1
[root ~]# modprobe -r oracleafd
modprobe: FATAL: Module oracleafd is in use.

There are issues like that in MOS, but none of those matched our scenario and/or patch level.

So a reboot is required, nice. That means our “gridSetup.sh” GUI that still has some work to do, will go away. Fortunately the documentation has a solution for that. We shall reboot the node and then run “gridSetup.sh” again and provide a response file. What the documentation does not tell is, that this response file was already created in $ORACLE_HOME/grid/install/response. We can identify the file by it’s name and timestamp.

So we went ahead and rebooted the first node. After it was up again we checked the kernel modules again, found “oracleafd” loaded again, but this time we were able to unload it.

[root ~]# lsmod |grep afd
oracleafd             204227  1
[root ~]# modprobe -r oracleafd
[root ~]# lsmod |grep afd
[root ~]# 

Maybe this step is not necessary but it helped us to stay calm at this point. We started “rootupgrade.sh” again and this time it run fine without any errors.

The next step is to run the “rootupgrade.sh” on the remaining node. It run into the same issue, so we rebooted the node, unloaded “oracleafd” kernel module and run “rootupgrade.sh” again which then run fine.

We were now up and running with GI 12.2. The final step is to run the “gridSetup.sh” again as described in the documentation to finalize the upgrade.

oracle$ $ORACLE_HOME/gridSetup.sh -executeConfigTools -responseFile $ORACLE_HOME/install/response/gridsetup.rsp

The went smooth and the cluster is finally upgraded to 12.2. As a last step we reconfigured the GIMR to use it’s dedicated diskgroup again. This is described in MOS Note 2065175.1 and is quite straight forward.

That’s it for today, I hope it will help you to stay calm during your cluster upgrades.

Advertisements

Data Guard warning: This file is unencrypted

Today I did some functional tests with a newly created Data Guard setup. The database is 12.1.0.2 with the latest (January) Bundlepatch installed and it does not use Oracle Managed Files (OMF).
During this tests I experienced a funny alert.log message at the standby site. I created a tablespace at the primary just to see that is automatically created at the standby too. The system is german, so please excuse the german messages, but I think you can get the idea.

SQL> create tablespace test datafile 'E:\DATABASE\P4\TABLESPACES\TEST01.dbf' size 1g;

Tablespace wurde angelegt.

The alert.log at the primary looks quite normal:

2018-02-01 09:55:55.679000 +01:00
create tablespace test datafile 'E:\DATABASE\P4\TABLESPACES\TEST01.dbf' size 1g
2018-02-01 09:56:02.806000 +01:00
Completed: create tablespace test datafile 'E:\DATABASE\P4\TABLESPACES\TEST01.dbf' size 1g

But at the standby site I read the following:

2018-02-01 09:56:00.856000 +01:00
WARNING: File being created with same name as in Primary
Existing file may be overwritten
2018-02-01 09:56:05.481000 +01:00
Recovery created file E:\DATABASE\P4\TABLESPACES\TEST01.DBF
WARNING: This file E:\DATABASE\P4\TABLESPACES\TEST01.DBF is created as unencrypted.Please consider encrypting this file!
Datafile 13 added to flashback set
Successfully added datafile 13 to media recovery
Datafile #13: 'E:\DATABASE\P4\TABLESPACES\TEST01.DBF'

None of my tablespaces at the primary site are encrypted or were ever encrypted. So a quick research at MOS pointed me to a discussion where a similar situation was solved by just setting the "db_create_file_dest" parameter. So I tried that at the standby:

SQL> alter system set db_create_file_dest='E:\DATABASE\P4\TABLESPACES';

System wurde geändert.

Then I dropped the tablespace and created it again:

SQL> drop tablespace test including contents and datafiles;

Tablespace wurde gelöscht.

SQL> create tablespace test datafile 'E:\DATABASE\P4\TABLESPACES\TEST01.dbf' size 1g;

Tablespace wurde angelegt.

Obviously there is no change at the primary, so I ommit that. But at the standby it now looks like I'd expect it in the first place:

2018-02-01 10:11:23.423000 +01:00
Datafile 13 added to flashback set
Successfully added datafile 13 to media recovery
Datafile #13: 'E:\DATABASE\P4\TABLESPACES\P4B\DATAFILE\O1_MF_TEST_F75PFP2Y_.DBF'

So even I am not using OMF I need to specify the "db_create_file_dest" parameter to prevent these strange warnings which would confuse my monitoring.

I did another test by resetting that parameter and setting "db_file_name_convert" instead, but this did not helped to prevent the warning.