Oracle Clusterware (19c) and SLES 12 certification

Recently we stumbled upon an issue with Oracle Clusterware on SLES 12 SP5. We checked the certification on MOS and found this statement: “SUSE Linux Enterprise Server 12 SP3: 4.4.162-94.72-default or later”. That “or later” lead us to the assumption, that it is certified. But it is not, I’ll explain why.
If you search MOS for Clusterware certifications, you walk through these steps:



Here is a direct link to that search.
And that is the point, where you have to read very carefully to interpret the result. The title saying “SP3+” might lead to the assumption that also later SPs are supported. But in fact they are not. The note states clearly, that only “SP3” is supported with some minimum kernel version to be used. I highlighted that in the picture above. So that means in turn, only SLES 12 SP3 is certified, nothing else.
If you are using or planning to use ACFS, you should also check the certification/compatibility for that. But there is no product for ACFS, there is just a note in MOS: 1369107.1.

In this note, the certification is stated a bit more cleary. It says “SP3” followed by “comma” and then the minimum required kernel version. So the same here, just SP3 of SLES12 is certified, nothing else. As far as I know, that SP3 is not under SuSE’s support anymore at the time of writing. Only SLES 15 is, but there is no upgrade path for this. If you choose to use SLES 15, you need to install it from scratch. And there are a couple of patches required to make the Clusterware run on SLES 15.
I hope this helps you understand and interpret the certifications a bit better next time and choose the propper OD version.

Oracle Grid Infrastructure vs. SLES 12

During a new installation of an Oracle Grid Infrastructure 19.4 at a customers site, I experienced a strange behaviour. The databases could be started and stopped via SQL*Plus as usual. Also stopping of databases using “srvctl” was fine. But when it came to starting a database via “srvctl” or rebooting the nodes, I experienced trouble. This looked like this:

oracle@proddb20:~> srvctl start database -db db3p
PRCR-1079 : Failed to start resource ora.db3p.db
CRS-5017: The resource action "ora.db3p.db start" encountered the following error:
ORA-00444: background process "PXMN" failed while starting
ORA-27300: OS system dependent operation:fork failed with status: 11
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/proddb20/crs/trace/crsd_oraagent_oracle.trc".

So I started to investigate and started with the alert.log of the database. It had some more details for me.

oracle@proddb20:~> tail /u01/app/oracle/diag/rdbms/db3p/db3p/trace/alert_db3p.log
2019-10-08T13:20:56.263194+02:00
Errors in file /u01/app/oracle/diag/rdbms/db3p/db3p/trace/db3p_psp0_187365.trc:
ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
ORA-27302: failure occurred at: skgpspawn3
2019-10-08T13:20:57.237250+02:00
Process LREG died, see its trace file
USER (ospid: ): terminating the instance due to ORA error
2019-10-08T13:20:58.269727+02:00
Instance terminated by USER, pid = 187344

The messages vary, the failing function mentioned in the ORA-27302 is not always the same. A look in the mentioned tracefile revealed the following information.

oracle@proddb20:~> cat /u01/app/oracle/diag/rdbms/db3p/db3p/trace/db3p_psp0_187365.trc
Trace file /u01/app/oracle/diag/rdbms/db3p/db3p/trace/db3p_psp0_187365.trc
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.4.0.0.0
Build label:    RDBMS_19.3.0.0.0DBRU_LINUX.X64_190417
ORACLE_HOME:    /u01/app/oracle/product/19.3.0/db_ee_1
System name:    Linux
Node name:      proddb20
Release:        4.4.166-3.g849dcaf-default
Version:        #1 SMP Fri Dec 7 15:18:32 UTC 2018 (849dcaf)
Machine:        x86_64
Instance name: db3p
Redo thread mounted by this instance: 0 

Oracle process number: 4
Unix process pid: 187365, image: oracle@proddb20 (PSP0)
  
*** 2019-10-08T13:20:56.240852+02:00
*** SESSION ID:(61.55718) 2019-10-08T13:20:56.240872+02:00
*** CLIENT ID:() 2019-10-08T13:20:56.240876+02:00
*** SERVICE NAME:() 2019-10-08T13:20:56.240879+02:00
*** MODULE NAME:() 2019-10-08T13:20:56.240882+02:00
*** ACTION NAME:() 2019-10-08T13:20:56.240885+02:00
*** CLIENT DRIVER:() 2019-10-08T13:20:56.240887+02:00
 
Process startup failed, error stack:
ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
ORA-27302: failure occurred at: skgpspawn3
OS - DIAGNOSTICS
----------------
loadavg : 0.27 0.12 0.11
Memory (Avail / Total) = 118346.89M / 128328.88M
Swap (Avail / Total) = 16386.00M /  16386.00M
Max user processes limits(s / h) =  65536 / 65536
----------------

My first guess was, that some kernel parameters were not set properly. But a quick check showed that everything was fine at that point.
That’s why I went to My Oracle Support were I quickly found this note: SLES 12: Database Startup Error with ORA-27300 ORA-27301 ORA-27303 While Starting using Srvctl (Doc ID 2340986.1). The note talks about a new functionality named “cgroup controller” introduced in SLES 12. This new functionality limits the maximum number of so-called tasks that a single process may start. The default limit for this is 512 tasks per process. For the Grid Infrastructure in it’s very basic setup right after the installation it looks like this.

proddb20:~ # systemctl status ohasd
ohasd.service - LSB: Start and Stop Oracle High Availability Service
   Loaded: loaded (/etc/init.d/ohasd; bad; vendor preset: disabled)
   Active: active (exited) since Fri 2019-10-04 14:30:19 CEST
     Docs: man:systemd-sysv-generator(8)
  Process: 4024 ExecStart=/etc/init.d/ohasd start (code=exited, tatus=0/SUCCESS)
    Tasks: 471 (limit: 512) 

As you can see, it is already near the limit. So if I now start a database instance, I will definitely reach that limit causing the instance startup to fail.
The limit can be increased by modifying/specifying the value for “DefaultTasksMax” in “/etc/systemd/system.conf”.

proddb20:~ # grep DefaultTasksMax /etc/systemd/system.conf 
#DefaultTasksMax=512
proddb20:~ # vi DefaultTasksMax /etc/systemd/system.conf 

proddb20:~ # grep DefaultTasksMax /etc/systemd/system.conf 
DefaultTasksMax=65535

After a reboot the new value is being picked up and the Grid Infrastructure can now start much more tasks. That means, the databases come up right away during the node startup and I am finally able to start the databases using “srvctl”.

proddb20:~ # systemctl status ohasd
ohasd.service - LSB: Start and Stop Oracle High Availability Service
   Loaded: loaded (/etc/init.d/ohasd; bad; vendor preset: disabled)
   Active: active (exited) since Tue 2019-10-08 14:30:19 CEST; 1min 12s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 4024 ExecStart=/etc/init.d/ohasd start (code=exited, tatus=0/SUCCESS)
    Tasks: 463 (limit: 65535)

So it is defnitely a good idea to set/increase that limit before you even start installing a Grid Infrastructure.

Oracle Grid Infrastructure 19c Installation

Lately I tried to install Oracle Grid Infrastructure on a SUSE SLES12 SP3 server. Don’t blame me for this OS, it was not my choice. The setup of the cluster went quite smooth, just some components did not start properly after the setup. We needed to re-issue the start or stop commands in order to make things happen. But this effect was gone one we applied the current (July 2019) Release Update. Unfortunately we saw some other nasty error messages in the crs alert.log:

CRS-8503 [__lll_unlock_elision()+48] [Signal/Exception: 11] [Instruction Addr: 0x7f1ec2deb4a0] [Memory Addr: (nil)] [] [] [] [] [] [] [] []

A quick reseach on MOS revealed note Installation of Oracle Clusterware 12c(12cR1 & 12cR2) on SLES 12 fails with CRS-8503 [Signal / Exception: 11] (Doc ID 2270947.1) with a solution, deconfigure the cluster, change some linking configuration and re-configure the cluster. I did not like the idea of re-creating the cluster, so I simply followed the steps to relink the binaries and in between, did the modifications mentioned in the MOS note.

root # export ORACLE_HOME=/u01/app/grid/19
root # $ORACLE_HOME/crs/install/rootcrs.sh -unlock
root # vi  /etc/ld.so.conf

  ### Add the this at the very top
  /lib64/noelision

root # ln -s /lib64/noelision/libpthread-2.19.so $ORACLE_HOEM/lib/libpthread.so.0

root # su - oracle
oracle $> export ORACLE_HOME=/u01/app/grid/19
oracle $> $ORACLE_HOME/bin/relink
oracle $> exit
root # $ORACLE_HOME/crs/install/rootcrs.sh -lock

Having done this, the error messages were gone.

Purging ADR home of upgraded Grid Infrastructure might fail

The second blog post today is about housekeeping. I typically use the “purgeLogs” script from MOS Doc-ID 2081655.1. It was intially designed for engineered systems but it works well on classic systems too. But today I found an error in the output:

2019-06-21 00:00:06: I adrci GI purging diagnostic destination diag/crs/odax6m/crs
2019-06-21 00:00:06: I ... purging ALERT older than 8 days
2019-06-21 00:00:06: W Not able to purge ALERT, due to error: DIA-49803: Purge not possible due to incompatible schema version.

Typically this happens, when you use an “adrci” that does not match the ADR home you are trying to purge. That’s maybe the reason, why “purgeLogs” tries different adrci’s to purge the RDBMS homes.
But what is wrong here? Let’s try the manual way.

oracle@odax6m ~> adrci

ADRCI: Release 18.0.0.0.0 - Production on Fri Jun 21 13:53:17 2019

Copyright (c) 1982, 2018, Oracle and/or its affiliates.  All rights reserved.

ADR base = "/u01/app/oracle"
adrci> set home crs
adrci> purge
DIA-49803: Purge not possible due to incompatible schema version.

When I put that message in MOS, I quickly found this bug: Bug 28375106 – ADRCI Purge of Grid Home Fails With “DIA-49803: Purge not possible due to incompatible schema version” After Upgrade from 12.1 (Doc ID 28375106.8). So I crosschecked that.

adrci> show version schema
Schema version (on-disk): 107
Schema version (library): 110

And indeed, the versions are different. Obviously the issue is related to
upgrades. In my case this is an Oracle Database Appliance, that has been patched from 18.3 to 18.5. Unfortunately MOS has no workaround for this. But let’s check what “adrci” may can do for me.

adrci> help extended

 HELP [topic]
   Available Topics:
        BEGIN BACKUP
        CD
        CREATE STAGING XMLSCHEMA
        CREATE VIEW
        DDE
        DEFINE
        DELETE
        DESCRIBE
        DROP VIEW
        END BACKUP
        INSERT
        LIST DEFINE
        MERGE ALERT
        MERGE FILE
        MERGE LOG
        MIGRATE
        QUERY
        REPAIR
        SET COLUMN
        SHOW CATALOG
        SHOW DUMP
        SHOW SECTION
        SHOW TRACE
        SHOW TRACEMAP
        SWEEP
        UNDEFINE
        UPDATE
        VIEW

The “MIGRATE” sounds interresting.

adrci> help migrate

  Usage: MIGRATE [RELATION  | SCHEMA] [-DOWNGRADE | -RECOVER]

  Purpose: Upgrades the relation or schema to the latest version

  Options:
    [-downgrade] Downgrade schema to previous version
    [-recover]   Recover schema from failed migrate

  Examples:
    migrate schema
    migrate relation incident

Sounds like I do want to do exactly this. Give it a try.

adrci> migrate schema
Schema migrated.
adrci> show version schema
Schema version (on-disk): 110
Schema version (library): 110

And voilà, now I can purge the Grid Inftrastructure ADR home again.

adrci> purge
adrci>

Since the bug is about the behaviour after upgrading from 12.1, this might apply to other upgrade or update paths too. Be aware of that and monitor your housekeeping scripts. And check your ADR schema version after you upgraded or patched your Grid Infrastructure installation.

Auto Resize and Shrink ACFS Volumes

Today I want to write a little about ACFS. In case you are using the latest and greatest Oracle Grid Infrastructure and you still run 11.2 databases, you are required to use ACFS. This especially applies to the Database Appliances, ExaDatas and for the Cloud Services, if you use Grid Infrastructure there. Having that, you might run into situations where your ACFS filesystem grows which it does automatically. But what if you want to shrink the filesystem to it’s original or at least a smaller size? With version 18.3 this should work quite well. Version before did had some issues when there was no contigous free space at the end of the volume to do the shrinking. Read more about that in MOS Note “ODA (Oracle Database Appliance): Resizing ACFS DBStorage File System (Doc ID 1990430.1)“.
Basically, there are two major topics. First, the shrinking of ACFS volumes and second, limiting the auto resize of ACFS.

Current situation

Given a customers ODA, this is the current situation.

[root@oda-x7m ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroupSys-LogVolRoot
                       30G  3.4G   25G  13% /
tmpfs                 189G  1.3G  187G   1% /dev/shm
/dev/md0              477M   40M  412M   9% /boot
/dev/sda1             500M  320K  500M   1% /boot/efi
/dev/mapper/VolGroupSys-LogVolOpt
                       59G   24G   33G  43% /opt
/dev/mapper/VolGroupSys-LogVolU01
                       99G   78G   16G  84% /u01
/dev/asm/commonstore-241
                      5.0G  319M  4.7G   7% /opt/oracle/dcs/commonstore
/dev/asm/dattfstar01-241
                      170G  158G   13G  93% /u02/app/oracle/oradata/TFSTAR01
/dev/asm/reco-135     883G  693G  191G  79% /u03/app/oracle
/dev/asm/datxvgvg-241
                      100G  5.9G   95G   6% /u02/app/oracle/oradata/XVGVG
/dev/asm/datxdovg-241
                      410G  399G   12G  98% /u02/app/oracle/oradata/XDOVG
/dev/asm/datxvgovg-241
                      100G  5.7G   95G   6% /u02/app/oracle/oradata/XVGOVG
/dev/asm/datxdoovg-241
                      390G  376G   15G  97% /u02/app/oracle/oradata/XDOOVG
/dev/asm/datxvgfg-241
                      100G  5.1G   95G   6% /u02/app/oracle/oradata/XVGFG
/dev/asm/datxdofg-241
                      100G   46G   55G  46% /u02/app/oracle/oradata/XDOFG
/dev/asm/datsfstar01-241
                      100G  6.3G   94G   7% /u02/app/oracle/oradata/SFSTAR01
/dev/asm/dattvgfg01-241
                      100G  5.1G   95G   6% /u02/app/oracle/oradata/TVGFG01
/dev/asm/dattfstar03-241
                      100G  4.9G   96G   5% /u02/app/oracle/oradata/TFSTAR03
/dev/asm/dattfstar04-241
                      100G  7.0G   94G   7% /u02/app/oracle/oradata/TFSTAR04
/dev/asm/dattfstar02-241
                      1.9T  1.9T   76M 100% /u02/app/oracle/oradata/TFSTAR02
/dev/asm/dattdomea01-241
                      100G   20G   81G  20% /u02/app/oracle/oradata/TDOMEA01
/dev/asm/dattfstar05-241
                      100G  5.2G   95G   6% /u02/app/oracle/oradata/TFSTAR05
/dev/asm/datifstar01-241
                      100G  5.1G   95G   6% /u02/app/oracle/oradata/IFSTAR01A
/dev/asm/datrfstar01-241
                      1.1T  1.1T  9.4G 100% /u02/app/oracle/oradata/RFSTAR01A
jus-srv-rz50.justiz.sachsen.de:/u01/app/DB-dumps
                      3.0T  2.0T  924G  68% /u99/nfs/DB-dumps
/dev/asm/datsfstar02-241
                      100G  3.6G   97G   4% /u02/app/oracle/oradata/SFSTAR02

Shrinking an ACFS Filesystem

As to be seen above, there are quite a lot of databases that have a lot of free space in their filesystems. The initial size is defined by the shapes that an ODA provides. But in our case, even 100GB are too much, so we wanted to reduce the occupied space.
First, we get some basic information.

[oracle@oda-x7m ~]$ /sbin/advmutil volinfo /dev/asm/datxvgvg-241
Device: /dev/asm/datxvgvg-241
Size (MB): 102400
Resize Increment (MB): 512
Redundancy: mirror
Stripe Columns: 8
Stripe Width (KB): 1024
Disk Group: DATA
Volume: DATXVGVG
Compatible.advm: 12.2.0.1.0

The “resize” feature requires the ADVM compatibility to be at least 12.2.0.1, so this is the case. Now let’s do the shrink.

[oracle@oda-x7m ~]$ acfsutil size -h
Usage: acfsutil [-h] size [[-|+]nnn[K|M|G|T|P]] [-a nnn[K|M|G|T|P]] [-x nnn[K|M|G|T|P]] [-d ]  
                                         - Resize file system and configure auto-resize
                [-a]                     - Auto-resize increment
                [-x]                     - Auto-resize maximum
                [-d]                     - Resize only this device

[oracle@oda-x7m ~]$ acfsutil size -80G /u02/app/oracle/oradata/XVGVG
acfsutil size: new file system size: 21474836480 (20480MB)

After that, the volume is at the desired size of 20GB.

[oracle@oda-x7m ~]$ /sbin/advmutil volinfo /dev/asm/datxvgvg-241
Device: /dev/asm/datxvgvg-241
Size (MB): 20480
Resize Increment (MB): 512
Redundancy: mirror
Stripe Columns: 8
Stripe Width (KB): 1024
Disk Group: DATA
Volume: DATXVGVG
Compatible.advm: 12.2.0.1.0

Limiting auto increment of ACFS

The automatic increment of ACFS volumes has two implications. First, it makes monitoring difficult because you never know is the filesystem is really full or will it automatically resize if needed. To figure out, if you are having a critical issue, you need to perform several steps. Implementing this with a monitoring system will be cumbersome in most cases. The other thing is, managing the storage resources is not straight-forward. You might overcommit your storage. My approach and advise is, to disable automatic resizing by setting the ACFS size and maximum size to the value, that you commited for this specific database. Let’s check the setup before doing that.

[oracle@oda-x7m ~]$ acfsutil info fs /u02/app/oracle/oradata/ICM01A
/u02/app/oracle/oradata/ICM01A
    ACFS Version: 12.2.0.1.0
    on-disk version:       46.0
    compatible.advm:       12.2.0.0.0
    ACFS compatibility:    12.2.0.0.0
    flags:        MountPoint,Available,AutoResizeEnabled
    mount time:   Fri Mar 15 10:09:28 2019
    mount sequence number: 4
    allocation unit:       4096
    metadata block size:   4096
    volumes:      1
    total size:   107374182400  ( 100.00 GB )
    total free:   91318923264  (  85.05 GB )
    auto-resize increment:   10737418240  (  10.00 GB )
    auto-resize maximum:     0  ( 0.00 )
    file entry table allocation: 262144
    primary volume: /dev/asm/daticm01-411
        label:
        state:                 Available
        major, minor:          246, 210435
        logical sector size:   512
        size:                  107374182400  ( 100.00 GB )
        free:                  91318923264  (  85.05 GB )
        metadata read I/O count:         54838
        metadata write I/O count:        51022
        total metadata bytes read:       242651136  ( 231.41 MB )
        total metadata bytes written:    233385984  ( 222.57 MB )
        ADVM diskgroup:        DATA
        ADVM resize increment: 536870912
        ADVM redundancy:       mirror
        ADVM stripe columns:   8
        ADVM stripe width:     1048576
    number of snapshots:  0
    snapshot space usage: 0  ( 0.00 )
    replication status: DISABLED
    compression status: DISABLED

So I set the maximum size to the current size.

[oracle@oda-x7m ~]$ acfsutil size -x 100G /u02/app/oracle/oradata/ICM01A
acfsutil size: ACFS-03642: successfully updated auto-resize settings

Just crosscheck that.

[oracle@oda-x7m ~]$ acfsutil info fs /u02/app/oracle/oradata/ICM01A
/u02/app/oracle/oradata/ICM01A
    ACFS Version: 12.2.0.1.0
    on-disk version:       46.0
    compatible.advm:       12.2.0.0.0
    ACFS compatibility:    12.2.0.0.0
    flags:        MountPoint,Available,AutoResizeEnabled
    mount time:   Fri Mar 15 10:09:28 2019
    mount sequence number: 4
    allocation unit:       4096
    metadata block size:   4096
    volumes:      1
    total size:   107374182400  ( 100.00 GB )
    total free:   91318923264  (  85.05 GB )
    auto-resize increment:   10737418240  (  10.00 GB )
    auto-resize maximum:     107374182400  ( 100.00 GB )
    file entry table allocation: 262144
    primary volume: /dev/asm/daticm01-411
        label:
        state:                 Available
        major, minor:          246, 210435
        logical sector size:   512
        size:                  107374182400  ( 100.00 GB )
        free:                  91318923264  (  85.05 GB )
        metadata read I/O count:         54840
        metadata write I/O count:        51022
        total metadata bytes read:       242659328  ( 231.42 MB )
        total metadata bytes written:    233385984  ( 222.57 MB )
        ADVM diskgroup:        DATA
        ADVM resize increment: 536870912
        ADVM redundancy:       mirror
        ADVM stripe columns:   8
        ADVM stripe width:     1048576
    number of snapshots:  0
    snapshot space usage: 0  ( 0.00 )
    replication status: DISABLED
    compression status: DISABLED

You could also combine the resize operation which returns the final ACFS size which in turn can be used for setting the maximum size.
And that’s basically it, there is nothing more to do.

Adding VLANs on an ODA

The Oracle Database Appliances X7-2 come only with one redundant network interface, so it is not possible to separate network traffic between distinct fault-tolerant bonded interfaces. But at least we can use VLANs on the one and only bonded interface that all the ODA X7-2 provide. If you choose to use VLANs, this choice must be taken at the point of installing the ODA. There is no easy way to change that afterwards. So if you use VLANs, you do it right from the beginning when you do the “configure-firstnet”. Once the setup is finished, it will end up in a configuration simiar to this.

[root@odax7-2m ~]# odacli list-networks

ID                                       Name                 NIC        InterfaceType IP Address         Subnet Mask        Gateway            VlanId
---------------------------------------- -------------------- ---------- ---------- ------------------ ------------------ ------------------ ----------
32812bc3-5bb7-4f8e-9070-b011acde665d     Private-network      priv0      DUMMY      192.168.16.24      255.255.255.240
735e2695-d68a-4696-a596-ba39a7180de6     Public-network       btbond1.26 Vlan       172.17.11.165      255.255.255.0      172.17.11.10       26

Now we can add more networks. The additional networks must be added from the operating system point of view which is done by the appliance management “odacli”, and from the Grid Infrastructure point of view which in turn is done by using “srvctl”. Finally, the databases must be configured to use the propper VLANs for their communication. Let’s walk through these steps.

1. Create a new VLAN

[root@odax7-2m ~]# odacli create-network -no-d -g 172.17.2.10 -n btbond1 -t VLAN -p 172.17.2.9 -m lsnr2 -w Database -s 255.255.255.0 -v 21
{
  "jobId" : "bd4d9ba0-7f59-491c-878b-5c03d3be51b6",
  "status" : "Created",
  "message" : null,
  "reports" : [ ],
  "createTimestamp" : "March 15, 2019 10:56:15 AM CET",
  "resourceList" : [ ],
  "description" : "Network service creation with names btbond1:lsnr2 ",
  "updatedTime" : "March 15, 2019 10:56:15 AM CET"
}

[root@odax7-2m ~]# odacli describe-job -i "bd4d9ba0-7f59-491c-878b-5c03d3be51b6"

Job details
----------------------------------------------------------------
                     ID:  bd4d9ba0-7f59-491c-878b-5c03d3be51b6
            Description:  Network service creation with names btbond1:lsnr2
                 Status:  Success
                Created:  March 15, 2019 10:56:15 AM CET
                Message:

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Setting network                          March 15, 2019 10:56:15 AM CET      March 15, 2019 10:56:20 AM CET      Success
Setting up Vlan                          March 15, 2019 10:56:15 AM CET      March 15, 2019 10:56:20 AM CET      Success

[root@odax7-2m ~]# odacli list-networks

ID                                       Name                 NIC        InterfaceType IP Address         Subnet Mask        Gateway            VlanId
---------------------------------------- -------------------- ---------- ---------- ------------------ ------------------ ------------------ ----------
32812bc3-5bb7-4f8e-9070-b011acde665d     Private-network      priv0      DUMMY      192.168.16.24      255.255.255.240
735e2695-d68a-4696-a596-ba39a7180de6     Public-network       btbond1.26 Vlan       172.17.11.165      255.255.255.0      172.17.11.10       26
e69efc7a-5973-4911-85c7-9d8e8b4138be     lsnr2                btbond1.21 VLAN       172.17.2.9         255.255.255.0      172.17.2.10        21

[root@odax7-2m ~]# odacli describe-network -i e69efc7a-5973-4911-85c7-9d8e8b4138be

Network details
----------------------------------------------------------------
                     ID:  e69efc7a-5973-4911-85c7-9d8e8b4138be
                   Name:  lsnr2
                    NIC:  btbond1.21
          InterfaceType:  VLAN
             IP Address:  172.17.2.9
            Subnet Mask:  255.255.255.0
                Gateway:  172.17.2.10
                 VlanId:  21
                   Type:  Database
                Default:  false
                Created:  March 15, 2019 10:56:15 AM CET

[root@odax7-2m ~]# odaadmcli show vlan
        NAME                     ID    INTERFACE   CONFIG_TYPE IP_ADDRESS      NETMASK         GATEWAY         NODENUM
        lsnr2                    21    btbond1     Database    172.17.2.9      255.255.255.0   172.17.2.10     0
        vlan26                   26    btbond1     public      172.17.11.165   255.255.255.0   172.17.11.10    0

[root@odax7-2m ~]# ifconfig btbond1.21
btbond1.21 Link encap:Ethernet  HWaddr 00:10:E0:DE:6F:CB
          inet addr:172.17.2.9  Bcast:172.17.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:501 errors:0 dropped:0 overruns:0 frame:0
          TX packets:13 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:35567 (34.7 KiB)  TX bytes:720 (720.0 b)

2. Add the VLAN to the Grid Infrastructure
Note, you must set the environment variables to point to the Grid Infrastructure. We have a toolbox for Linux systems, that makes it easy to set and maintain different environment settings. This toolbox provides a function to set the propper Oracle environment which is called “soe” –  set oracle environment. You’ll see it’s usage in the following step.

First, we add the new VLAN as a network resource.

[root@odax7-2m ~]# soe +ASM

Environment :
===================

Setting         Value
--------------- ---------------------------
NLS_DATE_FORMAT "YYYY-MM-DD HH24:MI:SS"
NLS_LANG        "AMERICAN_GERMANY.AL32UTF8"
ORACLE_BASE     /u01/app/oracle
ORACLE_HOME     /u01/app/12.2.0.1/oracle
ORACLE_SID      +ASM1
ORACLE_UNQNAME  +ASM

[root@odax7-2m ~]# srvctl add network -netnum 21 -subnet 172.17.2.0/255.255.255.0/btbond1.21 -nettype static

[root@odax7-2m ~]# srvctl config network
Network 1 exists
Subnet IPv4: 172.17.11.0/255.255.255.0/btbond1.26, static
Subnet IPv6:
Ping Targets:
Network is enabled
Network is individually enabled on nodes:
Network is individually disabled on nodes:
Network 21 exists
Subnet IPv4: 172.17.2.0/255.255.255.0/btbond1.21, static
Subnet IPv6:
Ping Targets:
Network is enabled
Network is individually enabled on nodes:
Network is individually disabled on nodes:
Network 28 exists
Subnet IPv4: 172.17.5.0/255.255.255.0/btbond1.28, static
Subnet IPv6:
Ping Targets:
Network is enabled
Network is individually enabled on nodes:
Network is individually disabled on nodes:

Second, we definde the VIP and the listener. Please be careful, which OS user you use for which command. Have a look at the command prompt.

[root@odax7-2m ~]# srvctl add vip -node odax7-2m -address odax7-2m-lsnr2/255.255.255.0 -netnum 21
[root@odax7-2m ~]# su - oracle
[oracle@odax7-2m ~]$ soe +ASM

Environment :
===================

Setting         Value
--------------- ---------------------------
NLS_DATE_FORMAT "YYYY-MM-DD HH24:MI:SS"
NLS_LANG        "AMERICAN_GERMANY.AL32UTF8"
ORACLE_BASE     /u01/app/oracle
ORACLE_HOME     /u01/app/12.2.0.1/oracle
ORACLE_SID      +ASM1
ORACLE_UNQNAME  +ASM

[oracle@odax7-2m ~]$ srvctl add listener -netnum 21 -endpoints "TCP:1523" -listener listener2
[oracle@odax7-2m ~]$ srvctl start listener -listener listener2
[oracle@odax7-2m ~]$ lsnrctl status listener2

LSNRCTL for Linux: Version 12.2.0.1.0 - Production on 15-MAR-2019 11:10:26

Copyright (c) 1991, 2016, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER2)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER2
Version                   TNSLSNR for Linux: Version 12.2.0.1.0 - Production
Start Date                15-MAR-2019 11:10:03
Uptime                    0 days 0 hr. 0 min. 22 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/12.2.0.1/oracle/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/odax7-2m/listener2/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER2)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.17.2.9)(PORT=1523)))
The listener supports no services
The command completed successfully

3. Point a database to the new Listener
To make use of the new listener, the database(s) must register with that listener. So we simply set the “local_listener” parameter to point to the new listener and we are done.

[oracle@odax7-2m ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on MAR-15-2019 11:11:58 

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics,
Real Application Testing and Unified Auditing options

SQL> alter system set local_listener='odax7-2m-lsnr2:1523';

System altered.

SQL> exit

[oracle@odax7-2m ~]$ soe +ASM -s
[oracle@odax7-2m ~]$ lsnrctl status listener2

LSNRCTL for Linux: Version 12.2.0.1.0 - Production on 15-MAR-2019 11:15:23

Copyright (c) 1991, 2016, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER2)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER2
Version                   TNSLSNR for Linux: Version 12.2.0.1.0 - Production
Start Date                15-MAR-2019 11:10:03
Uptime                    0 days 0 hr. 5 min. 19 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/12.2.0.1/oracle/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/odax7-2m/listener2/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER2)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.17.2.9)(PORT=1523)))
Service "MMI.support.robotron.de" has 1 instance(s).
  Instance "MMI", status READY, has 2 handler(s) for this service...
Service "MMIXDB.support.robotron.de" has 1 instance(s).
  Instance "MMI", status READY, has 2 handler(s) for this service...
The command completed successfully

That’s it, it’s as simple as that. 

Missing Disk / Dismounting Diskgroup after duplicate from ASM to ACFS

Last week I was asked to create a Data Guard environment. Quite simple task, you may think. And actually it was, but with some funny side effects. The primary database is running on an Oracle Database Appliance X6-2M using ASM. The Standby database was planned to run on another ODA, a X5-2HA. The X5 is using pure ACFS. Both are running 12.1.0.2.170418 Bundlepatch. Be aware of that, the HA ODA’s are using PSUs whilst the smaller ones are using Bundlepatches. You should not mix up these, so I created another DB Home on the HA with the propper Bundlepatch. With the January ODA Update for the HA versions, Oracle moved to Bundlepatches too, but we are not yet there. So that’s it for the sake of completeness.

So what I did obviously in the first place was duplicating the primary database to the HA ODA. Once that was finished, I wanted to clean up the controlfile and get rid of all those backup and archivelog records and keep just the ones that are really available.

oracle@odax51 ~]$ rman target /

Recovery Manager: Release 12.1.0.2.0 - Production on Fri Mar 16 09:11:42 2018

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

connected to target database: COMA (DBID=1562414168, not open)

RMAN> catalog db_recovery_file_dest;

Starting implicit crosscheck backup at 2018-03-16 09:11:44
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
allocated channel: ORA_DISK_2
allocated channel: ORA_DISK_3
allocated channel: ORA_DISK_4
allocated channel: ORA_DISK_5
allocated channel: ORA_DISK_6
allocated channel: ORA_DISK_7
allocated channel: ORA_DISK_8

At this point the RMAN was stuck. A quick look in the alert.log revealed a whole bunch of messages like these:

2018-03-16 09:08:32.000000 +01:00
WARNING: ASMB force dismounting group 3 (RECO) due to missing disks
SUCCESS: diskgroup RECO was dismounted
NOTE: ASMB mounting group 3 (RECO)
NOTE: ASM background process initiating disk discovery for grp 3 (reqid:0)
WARNING: group 3 (RECO) has missing disks
ORA-15040: diskgroup is incomplete
WARNING: group 3 is being dismounted.

The ASM alert.log had corresponding messages:

2018-03-16 09:11:48.567000 +01:00
NOTE: client COMA1:COMA:odax5-c dismounting group 3 (RECO)
NOTE: client COMA1:COMA:odax5-c dismounting group 3 (RECO)
NOTE: client COMA1:COMA:odax5-c dismounting group 3 (RECO)
NOTE: client COMA1:COMA:odax5-c dismounting group 3 (RECO)
NOTE: client COMA1:COMA:odax5-c dismounting group 3 (RECO)
NOTE: client COMA1:COMA:odax5-c dismounting group 3 (RECO)
NOTE: client COMA1:COMA:odax5-c dismounting group 3 (RECO)
NOTE: client COMA1:COMA:odax5-c dismounting group 3 (RECO)

Oh sh… you might think, and that was exactly what I thought at that time. So I checked the ASM diskgroups, disks etc. but did not find anything that could be a problem.

So after a while of thinking, the idea came up that it might be related to the backup stuff in the controlfile. So I checked that and tried to unregister a backupiece manually. I used the undocumented DBMS_BACKUP_RESTORE package for that, so do this at your own risk.

SQL> select RECID, STAMP, SET_STAMP, SET_COUNT, HANDLE, PIECE# from v$backup_piece
2 where handle like '+%' and rownum=1;


    RECID      STAMP  SET_STAMP  SET_COUNT PIECE# HANDLE
--------- ---------- ---------- ---------- ------ ----------------------------------------------------------------------------
   129941  969656433  969656431     130820      7 +RECO/COMAX6/BACKUPSET/2018_03_01/nnndn1_tag20180301t210006_0.2815.969656433

SQL> exec dbms_backup_restore.changebackuppiece( -
2      recid => 129941, -
3      stamp => 969656433, -
4      set_stamp => 969656431, -
5      set_count => 130820, -
6      pieceno => 7, -
7      handle => '+RECO/COMAX6/BACKUPSET/2018_03_01/nnndn1_tag20180301t210006_0.2815.969656433', -
8      status => 'D' -
9	);

During the PL/SQL call I saw exact one message like the ones above in the alert.log. That explains te behaviour. During the “catalog” call from RMAN, an implicit crosscheck takes place. Since this tries to access the files in the RECO diskgroup and there is really nothing in that diskgroup except an ACFS volume, this error is being thrown.

That means, I need to get rid of all these records. A simple PL/SQL block helped me doing that.

SQL> set serveroutput on 
SQL> begin
2  for rec in (select RECID, STAMP, SET_STAMP, SET_COUNT, HANDLE, PIECE# 
3              from v$backup_piece 
4			  where HANDLE like '+%'
5  ) loop 
6    dbms_output.put_line('deleting ''' ||rec.handle);
7    dbms_backup_restore.changebackuppiece( 
8       recid => rec.recid,
9       stamp => rec.stamp, 
10      set_stamp => rec.set_stamp,
11      set_count => rec.set_count,
12      pieceno => rec.piece#,
13      handle => rec.handle,
14      status => 'D'
15	 );
16   end loop;
17 end;
18 /

It took a while and caused again a lot of messages in both, the database and the ASM alert.log, but finally I was able to run RMAN commands successfully again.

Maybe this helps you solve such issues, but be aware of the fact that using DBMS_BACKUP_RESTORE is not supported.