Some time ago a customer asked how to easily switch his two-node failover cluster to new servers. So I suggested to setup the new nodes and add them to the cluster. Then switch all the ressources from the old servers to the new ones and voila, we’re finished. The old nodes were running OEL5.6 so we decided to go to OEL7 with the new nodes. My Oracle Support was telling that this is certified, so nothing to complain about. That was the plan. Reality was a little different.
The preparation of the two new nodes went fine. But CVU had some complaints here and there. We went ahead and tried to add the first new node to the cluster. But the “addNode.sh” was doing the same checks that CVU did and since there were some issues, it aborted. So we needed a way to skip or ignore these checks. The solution was to set
which is checked inside the “addNode.sh” script and skips the checks if this variable is set to “Y”. Now the installation went on and copied the Grid Infrastructure to the new node. Next step was running the “root.sh” script. It started, but soon got stuck at “Adding Clusterware entries to inittab”. A couple of minutes later the script told us there was a timeout waiting for the OHASD to come up. Makes sense, there is no “inittab” anymore in OEL7. At least it does not matter what’s in there, OEL is now using systemd to control the services in different runlevels. Obviously the “root.sh” script which was being used, was created on the old OEL5 nodes. So what did we do? We created the services for systemd on our own, we copied the configuration from an existing OEL7 cluster:
# cd /etc/systemd/system # ls -l orac* -rw-r--r-- 1 root root 362 11. Mar 15:05 oracle-ohasd.service -rw-r--r-- 1 root root 319 11. Mar 15:06 oracle-tfa.service #
After adding these files to “/etc/systemd/system” we re-run the “root.sh” script. At the stage “Installing Trace File Analyzer” we enabled and started the TFA service:
systemctl enable oracle-tfa.service systemctl start oracle-tfa.service
When we came to the point where OHASD is being setup (“Adding Clusterware entries to inittab”) we checked the corresponding logfile “GRID_HOME/cfgtoollogs/crsconfig/rootcrs*.log”. Once the line with “crsctl start has” comes up, it is time to enable OHASD service too:
systemctl enable oracle-ohasd.service systemctl start oracle-ohasd.service
Now that the OHASD service is running, the “root.sh” script is very happy and continues and completes successfully without any further complaints.
I think this issue can occur when moving from any Linux release which is using old “inittab” to a new Linux release that makes use of “systemd” functionality. Maybe this can help some of you to make life easier.
Oracle now commited this behaviour as a bug: MOS Note 1959008.1