Duplicate Column Names?

Recently, at a customers site, we stumbled upon an error in one of the databases alert.log. It turned out, that it was caused by the nightly statistics gathering job. There was one table for which statistics could not be generated.

SQL> exec dbms_stats.gather_table_stats('BASIS','VG_XXXXXXXXXXX', method_opt=>'for all columns size 1');

BEGIN dbms_stats.gather_table_stats('BASIS','VG_XXXXXXXXXXX', method_opt=>'for all columns size 1'); END;
 
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-00001: unique constraint (SYS.I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST) violated
ORA-06512: at "SYS.DBMS_STATS", line 34850
ORA-06512: at line 1

So we first tried to remove the statistics, but we were unable to.

SQL> exec dbms_stats.delete_table_stats('BASIS','VG_XXXXXXXXXXX');
BEGIN dbms_stats.delete_table_stats('BASIS','VG_XXXXXXXXXXX'); END;
 
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-00001: unique constraint (SYS.I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST) violated
ORA-06512: at "SYS.DBMS_STATS", line 17279
ORA-06512: at line 1
 
SQL> exec dbms_stats.delete_table_stats('BASIS','VG_XXXXXXXXXXX',force=>true);
BEGIN dbms_stats.delete_table_stats('BASIS','VG_XXXXXXXXXXX',force=>true); END;
 
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-00001: unique constraint (SYS.I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST) violated
ORA-06512: at "SYS.DBMS_STATS", line 17279
ORA-06512: at line 1

So we started to dig deeper into that issue. We found out, that there is a duplicate column name when querying the Data Dictionary. But on the other hand, a simple “describe” does not show that.

SQL> desc owner.vg_xxxxxxxxxx

Name                                                  Null?    Type
----------------------------------------------------- -------- -----------------
[...]
XXXXXXXXXXXXXXX_ID                                             NUMBER(19)
XXXXXXXXXXXXXXXXX_XXX                                          NUMBER(19)
XXXXXXXXXXXXXXXXX_ID                                           NUMBER(19)
NAMENSZUSATZ                                                   NVARCHAR2(4000)
ENTITY_TIMESTAMP                                               TIMESTAMP(9)
OPTLOCKVERSION                                        NOT NULL NUMBER(19)
DATENUEBERNAHMETYP                                             NVARCHAR2(4000)
ZUGANGGEBUCHT                                         NOT NULL NUMBER(1)
UEBERSTELLER                                          NOT NULL NUMBER(1)
XXXXXXXXXXXXXX                                        NOT NULL NUMBER(1)
 
 
SQL> select column_id, column_name  from dba_tab_columns where table_name='VG_XXXXXXXXXXX' and column_name like 'Z%'
 
COLUMN_ID  COLUMN_NAME
---------- --------------------
        59 ZUGANGGEBUCHT
        59 ZUGANGGEBUCHT

Next step was to find out, where the data from the dictionary view is comming from.

SQL> select text from dba_views where view_name='DBA_TAB_COLUMNS';

TEXT
--------------------------------------------------------------------------------
select OWNER, TABLE_NAME,
       COLUMN_NAME, DATA_TYPE, DATA_TYPE_MOD, DATA_TYPE_OWNER,
       DATA_LENGTH, DATA_PRECISION, DATA_SCALE, NULLABLE, COLUMN_ID,
       DEFAULT_LENGTH, DATA_DEFAULT, NUM_DISTINCT, LOW_VALUE, HIGH_VALUE,
       DENSITY, NUM_NULLS, NUM_BUCKETS, LAST_ANALYZED, SAMPLE_SIZE,
       CHARACTER_SET_NAME, CHAR_COL_DECL_LENGTH,
       GLOBAL_STATS, USER_STATS, AVG_COL_LEN, CHAR_LENGTH, CHAR_USED,
       V80_FMT_IMAGE, DATA_UPGRADED, HISTOGRAM, DEFAULT_ON_NULL,
       IDENTITY_COLUMN, SENSITIVE_COLUMN,
       EVALUATION_EDITION, UNUSABLE_BEFORE, UNUSABLE_BEGINNING
  from DBA_TAB_COLS
where USER_GENERATED = 'YES'


SQL> select text from dba_views where view_name='DBA_TAB_COLS';

TEXT
--------------------------------------------------------------------------------
select
     OWNER, TABLE_NAME,
     COLUMN_NAME, DATA_TYPE, DATA_TYPE_MOD, DATA_TYPE_OWNER,
     DATA_LENGTH, DATA_PRECISION, DATA_SCALE, NULLABLE, COLUMN_ID,
     DEFAULT_LENGTH, DATA_DEFAULT, NUM_DISTINCT, LOW_VALUE, HIGH_VALUE,
     DENSITY, NUM_NULLS, NUM_BUCKETS, LAST_ANALYZED, SAMPLE_SIZE,
     CHARACTER_SET_NAME, CHAR_COL_DECL_LENGTH,
     GLOBAL_STATS,
     USER_STATS, AVG_COL_LEN, CHAR_LENGTH, CHAR_USED,
     V80_FMT_IMAGE, DATA_UPGRADED, HIDDEN_COLUMN, VIRTUAL_COLUMN,
     SEGMENT_COLUMN_ID, INTERNAL_COLUMN_ID, HISTOGRAM, QUALIFIED_COL_NAME,
     USER_GENERATED, DEFAULT_ON_NULL, IDENTITY_COLUMN, SENSITIVE_COLUMN,
     EVALUATION_EDITION, UNUSABLE_BEFORE, UNUSABLE_BEGINNING
from dba_tab_cols_v$


SQL> select text from dba_views where view_name='DBA_TAB_COLS_V$';

TEXT
--------------------------------------------------------------------------------
select u.name, o.name,
       c.name,

[...]

Ok, DBA_TAB_COLUMNS gets its data from DBA_TAB_COLS which in turn gets its data from DBA_TAB_COLS_V$. That view has the final SQL, that is being used to retrieve the data, I shortened it, you get the idea.
We then used the SQL from DBA_TAB_COLS_V$ to find the root cause of our duplicate column.

SQL> select
  2    , sys."_CURRENT_EDITION_OBJ" o
  3    , sys.hist_head$ h
  4    c.col#, c.obj#, c.name,  c.intcol#
  5    , sys.tab$ t
  6  where o.obj# = c.obj#
  7    , h.obj#, h.intcol#
  8  from sys.col$ c
  9        or
 10        (o.type# = 2     /* tables, excluding iot - overflow and nested tables */
 11          and
 12   , sys."_CURRENT_EDITION_OBJ" o
 13                         and (bitand(t.property, 512) = 512 or
 14                              bitand(t.property, 8192) = 8192))))
 15    , sys.hist_head$ h
 16    , sys.user$ u
 17    , sys.coltype$ ac
 18    , sys.obj$ ot
 19    , sys."_BASE_USER" ut
 20    , sys.tab$ t
 21  where o.obj# = c.obj#
 22    and o.owner# = u.user#
 23    and o.obj# = t.obj#(+)
 24    and c.obj# = h.obj#(+) and c.intcol# = h.intcol#(+)
 25    and c.obj# = ac.obj#(+) and c.intcol# = ac.intcol#(+)
 26    and ac.toid = ot.oid$(+)
 27    and ot.type#(+) = 13
 28    and ot.owner# = ut.user#(+)
 29    and (o.type# in (3, 4)                                     /* cluster, view */
 30         or
 31         (o.type# = 2     /* tables, excluding iot - overflow and nested tables */
 32          and
 33          not exists (select null
 34                        from sys.tab$ t
 35                       where t.obj# = o.obj#
 36                         and (bitand(t.property, 512) = 512 or
 37                              bitand(t.property, 8192) = 8192))))
 38    and c.name='ZUGANGGEBUCHT';

      COL#       OBJ# NAME                    INTCOL#       OBJ#    INTCOL#
---------- ---------- -------------------- ---------- ---------- ----------
        60      95556 ZUGANGGEBUCHT                60
        68      95523 ZUGANGGEBUCHT                68
        59      94177 ZUGANGGEBUCHT                59      94177         59   
        59      94177 ZUGANGGEBUCHT                59      94177         59   
        74      95522 ZUGANGGEBUCHT                74
        74      95550 ZUGANGGEBUCHT                74

6 rows selected.

Have a look at the highlighted rows, there’s the duplicate column. Obviously the join to HIST_HEAD$ introduced the second row. That means, there must be something wrong inside that table. And that turned out to be true.

SQL> desc hist_head$
Name                        Null?    Type
--------------------------- -------- -------------
OBJ#                        NOT NULL NUMBER
COL#                        NOT NULL NUMBER
BUCKET_CNT                  NOT NULL NUMBER
ROW_CNT                     NOT NULL NUMBER
CACHE_CNT                            NUMBER
NULL_CNT                             NUMBER
TIMESTAMP#                           DATE
SAMPLE_SIZE                          NUMBER
MINIMUM                              NUMBER
MAXIMUM                              NUMBER
DISTCNT                              NUMBER
LOWVAL                               RAW(1000)
HIVAL                                RAW(1000)
DENSITY                              NUMBER
INTCOL#                     NOT NULL NUMBER
SPARE1                               NUMBER
SPARE2                               NUMBER
AVGCLN                               NUMBER
SPARE3                               NUMBER
SPARE4                               NUMBER

SQL> select  COL#, BUCKET_CNT, ROW_CNT, NULL_CNT, TIMESTAMP#, SAMPLE_SIZE from  hist_head$ where obj#=94177 and intcol#=59;

      COL# BUCKET_CNT    ROW_CNT   NULL_CNT TIMESTAMP#          SAMPLE_SIZE
---------- ---------- ---------- ---------- ------------------- -----------
         0          1          0      35656 2017-01-05 22:00:23        5502
         0          1          0      35383 2017-01-05 22:00:23        5775

A quick MOS-search revealed two notes, that describe the issue, Alter Table Drop Column Failing With ORA-00600[16515] (Doc ID 2375301.1) and DBMS_STATS.DELETE_TABLE_STATS Fails With ORA-600 [16515] (Doc ID 1233745.1). The latter one has the final solution for this issue.
We first identified the older of the two rows which we then deleted.

SQL> SELECT rowid,obj#,intcol#,timestamp# FROM hist_head$ WHERE obj#=94177 and intcol#=59 order by timestamp#;

ROWID                    OBJ#    INTCOL# TIMESTAMP#
------------------ ---------- ---------- -------------------
AAAABEAABAAAWh0AAj      94177         59 2017-01-05 22:00:23
AAAABEAABAAAWh0AAi      94177         59 2017-01-05 22:00:23

SQL> DELETE FROM hist_head$ WHERE ROWID='AAAABEAABAAAWh0AAj';

1 row deleted.

SQL>  COMMIT;

Commit complete.

After this, DBA_TAB_COLUMNS is back to distinct column names.

SQL> select column_id, column_name  from dba_tab_columns where table_name='VG_XXXXXXXXXXX' and column_name like 'Z%';

COLUMN_ID COLUMN_NAME
---------- --------------------
        59 ZUGANGGEBUCHT

And of cause, the statistics gathering was and is running fine again.

Advertisements

Auto Resize and Shrink ACFS Volumes

Today I want to write a little about ACFS. In case you are using the latest and greatest Oracle Grid Infrastructure and you still run 11.2 databases, you are required to use ACFS. This especially applies to the Database Appliances, ExaDatas and for the Cloud Services, if you use Grid Infrastructure there. Having that, you might run into situations where your ACFS filesystem grows which it does automatically. But what if you want to shrink the filesystem to it’s original or at least a smaller size? With version 18.3 this should work quite well. Version before did had some issues when there was no contigous free space at the end of the volume to do the shrinking. Read more about that in MOS Note “ODA (Oracle Database Appliance): Resizing ACFS DBStorage File System (Doc ID 1990430.1)“.
Basically, there are two major topics. First, the shrinking of ACFS volumes and second, limiting the auto resize of ACFS.

Current situation

Given a customers ODA, this is the current situation.

[root@oda-x7m ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroupSys-LogVolRoot
                       30G  3.4G   25G  13% /
tmpfs                 189G  1.3G  187G   1% /dev/shm
/dev/md0              477M   40M  412M   9% /boot
/dev/sda1             500M  320K  500M   1% /boot/efi
/dev/mapper/VolGroupSys-LogVolOpt
                       59G   24G   33G  43% /opt
/dev/mapper/VolGroupSys-LogVolU01
                       99G   78G   16G  84% /u01
/dev/asm/commonstore-241
                      5.0G  319M  4.7G   7% /opt/oracle/dcs/commonstore
/dev/asm/dattfstar01-241
                      170G  158G   13G  93% /u02/app/oracle/oradata/TFSTAR01
/dev/asm/reco-135     883G  693G  191G  79% /u03/app/oracle
/dev/asm/datxvgvg-241
                      100G  5.9G   95G   6% /u02/app/oracle/oradata/XVGVG
/dev/asm/datxdovg-241
                      410G  399G   12G  98% /u02/app/oracle/oradata/XDOVG
/dev/asm/datxvgovg-241
                      100G  5.7G   95G   6% /u02/app/oracle/oradata/XVGOVG
/dev/asm/datxdoovg-241
                      390G  376G   15G  97% /u02/app/oracle/oradata/XDOOVG
/dev/asm/datxvgfg-241
                      100G  5.1G   95G   6% /u02/app/oracle/oradata/XVGFG
/dev/asm/datxdofg-241
                      100G   46G   55G  46% /u02/app/oracle/oradata/XDOFG
/dev/asm/datsfstar01-241
                      100G  6.3G   94G   7% /u02/app/oracle/oradata/SFSTAR01
/dev/asm/dattvgfg01-241
                      100G  5.1G   95G   6% /u02/app/oracle/oradata/TVGFG01
/dev/asm/dattfstar03-241
                      100G  4.9G   96G   5% /u02/app/oracle/oradata/TFSTAR03
/dev/asm/dattfstar04-241
                      100G  7.0G   94G   7% /u02/app/oracle/oradata/TFSTAR04
/dev/asm/dattfstar02-241
                      1.9T  1.9T   76M 100% /u02/app/oracle/oradata/TFSTAR02
/dev/asm/dattdomea01-241
                      100G   20G   81G  20% /u02/app/oracle/oradata/TDOMEA01
/dev/asm/dattfstar05-241
                      100G  5.2G   95G   6% /u02/app/oracle/oradata/TFSTAR05
/dev/asm/datifstar01-241
                      100G  5.1G   95G   6% /u02/app/oracle/oradata/IFSTAR01A
/dev/asm/datrfstar01-241
                      1.1T  1.1T  9.4G 100% /u02/app/oracle/oradata/RFSTAR01A
jus-srv-rz50.justiz.sachsen.de:/u01/app/DB-dumps
                      3.0T  2.0T  924G  68% /u99/nfs/DB-dumps
/dev/asm/datsfstar02-241
                      100G  3.6G   97G   4% /u02/app/oracle/oradata/SFSTAR02

Shrinking an ACFS Filesystem

As to be seen above, there are quite a lot of databases that have a lot of free space in their filesystems. The initial size is defined by the shapes that an ODA provides. But in our case, even 100GB are too much, so we wanted to reduce the occupied space.
First, we get some basic information.

[oracle@oda-x7m ~]$ /sbin/advmutil volinfo /dev/asm/datxvgvg-241
Device: /dev/asm/datxvgvg-241
Size (MB): 102400
Resize Increment (MB): 512
Redundancy: mirror
Stripe Columns: 8
Stripe Width (KB): 1024
Disk Group: DATA
Volume: DATXVGVG
Compatible.advm: 12.2.0.1.0

The “resize” feature requires the ADVM compatibility to be at least 12.2.0.1, so this is the case. Now let’s do the shrink.

[oracle@oda-x7m ~]$ acfsutil size -h
Usage: acfsutil [-h] size [[-|+]nnn[K|M|G|T|P]] [-a nnn[K|M|G|T|P]] [-x nnn[K|M|G|T|P]] [-d ]  
                                         - Resize file system and configure auto-resize
                [-a]                     - Auto-resize increment
                [-x]                     - Auto-resize maximum
                [-d]                     - Resize only this device

[oracle@oda-x7m ~]$ acfsutil size -80G /u02/app/oracle/oradata/XVGVG
acfsutil size: new file system size: 21474836480 (20480MB)

After that, the volume is at the desired size of 20GB.

[oracle@oda-x7m ~]$ /sbin/advmutil volinfo /dev/asm/datxvgvg-241
Device: /dev/asm/datxvgvg-241
Size (MB): 20480
Resize Increment (MB): 512
Redundancy: mirror
Stripe Columns: 8
Stripe Width (KB): 1024
Disk Group: DATA
Volume: DATXVGVG
Compatible.advm: 12.2.0.1.0

Limiting auto increment of ACFS

The automatic increment of ACFS volumes has two implications. First, it makes monitoring difficult because you never know is the filesystem is really full or will it automatically resize if needed. To figure out, if you are having a critical issue, you need to perform several steps. Implementing this with a monitoring system will be cumbersome in most cases. The other thing is, managing the storage resources is not straight-forward. You might overcommit your storage. My approach and advise is, to disable automatic resizing by setting the ACFS size and maximum size to the value, that you commited for this specific database. Let’s check the setup before doing that.

[oracle@oda-x7m ~]$ acfsutil info fs /u02/app/oracle/oradata/ICM01A
/u02/app/oracle/oradata/ICM01A
    ACFS Version: 12.2.0.1.0
    on-disk version:       46.0
    compatible.advm:       12.2.0.0.0
    ACFS compatibility:    12.2.0.0.0
    flags:        MountPoint,Available,AutoResizeEnabled
    mount time:   Fri Mar 15 10:09:28 2019
    mount sequence number: 4
    allocation unit:       4096
    metadata block size:   4096
    volumes:      1
    total size:   107374182400  ( 100.00 GB )
    total free:   91318923264  (  85.05 GB )
    auto-resize increment:   10737418240  (  10.00 GB )
    auto-resize maximum:     0  ( 0.00 )
    file entry table allocation: 262144
    primary volume: /dev/asm/daticm01-411
        label:
        state:                 Available
        major, minor:          246, 210435
        logical sector size:   512
        size:                  107374182400  ( 100.00 GB )
        free:                  91318923264  (  85.05 GB )
        metadata read I/O count:         54838
        metadata write I/O count:        51022
        total metadata bytes read:       242651136  ( 231.41 MB )
        total metadata bytes written:    233385984  ( 222.57 MB )
        ADVM diskgroup:        DATA
        ADVM resize increment: 536870912
        ADVM redundancy:       mirror
        ADVM stripe columns:   8
        ADVM stripe width:     1048576
    number of snapshots:  0
    snapshot space usage: 0  ( 0.00 )
    replication status: DISABLED
    compression status: DISABLED

So I set the maximum size to the current size.

[oracle@oda-x7m ~]$ acfsutil size -x 100G /u02/app/oracle/oradata/ICM01A
acfsutil size: ACFS-03642: successfully updated auto-resize settings

Just crosscheck that.

[oracle@oda-x7m ~]$ acfsutil info fs /u02/app/oracle/oradata/ICM01A
/u02/app/oracle/oradata/ICM01A
    ACFS Version: 12.2.0.1.0
    on-disk version:       46.0
    compatible.advm:       12.2.0.0.0
    ACFS compatibility:    12.2.0.0.0
    flags:        MountPoint,Available,AutoResizeEnabled
    mount time:   Fri Mar 15 10:09:28 2019
    mount sequence number: 4
    allocation unit:       4096
    metadata block size:   4096
    volumes:      1
    total size:   107374182400  ( 100.00 GB )
    total free:   91318923264  (  85.05 GB )
    auto-resize increment:   10737418240  (  10.00 GB )
    auto-resize maximum:     107374182400  ( 100.00 GB )
    file entry table allocation: 262144
    primary volume: /dev/asm/daticm01-411
        label:
        state:                 Available
        major, minor:          246, 210435
        logical sector size:   512
        size:                  107374182400  ( 100.00 GB )
        free:                  91318923264  (  85.05 GB )
        metadata read I/O count:         54840
        metadata write I/O count:        51022
        total metadata bytes read:       242659328  ( 231.42 MB )
        total metadata bytes written:    233385984  ( 222.57 MB )
        ADVM diskgroup:        DATA
        ADVM resize increment: 536870912
        ADVM redundancy:       mirror
        ADVM stripe columns:   8
        ADVM stripe width:     1048576
    number of snapshots:  0
    snapshot space usage: 0  ( 0.00 )
    replication status: DISABLED
    compression status: DISABLED

You could also combine the resize operation which returns the final ACFS size which in turn can be used for setting the maximum size.
And that’s basically it, there is nothing more to do.

ADR Home is growing permanently

Housekeeping is an often underestimated topic when running databases in production. In the blog post about the size based deletion policy I described a new way of keeping the size of an ADR home limited. But what if you are not yet on Oracle Database 12.2+? Then you have to rely on the existing time based deletion policies. That is what we did and do for our customers. A couple of weeks ago one of them experienced space pressure on their /u01 filesystem on one of their ODAs. The automatic purging worked fine, but the filesystem usage was still increasing. The Enterprise Manager metric outlines this behaviour.

adr-growth-01

So we started to investigate this and came to the trace-directory of some databases. This is an example of one of them.

[root@odax7-2m trace]#  du -hs * | sort -h | tail -6

68M     DOMEADD_gen0_82205.trm
69M     DOMEADD_mmon_82354.trm
131M    DOMEADD_ipc0_82195.trm
419M    DOMEADD_gen0_82205.trc
423M    DOMEADD_mmon_82354.trc
462M    DOMEADD_ipc0_82195.trc

We started with the largest trace and had a look at its end.

[root@odax7-2m trace]# tail DOMEADD_ipc0_82195.trc
2019-03-15 09:23:07.749*:kjuinc(): no cluster database, return inc# KSIMINVL

*** 2019-03-15T09:23:10.913816+01:00
2019-03-15 09:23:10.913*:kjuinc(): no cluster database, return inc# KSIMINVL

Having this, we did a quick research on My Oracle Support which revealed Bug 27989556  Excessive Trace Message: no cluster database, return inc# ksiminvl. And of cause there was no fix available. This changed by April 16th, now there’s a fix for 18c available with the 18.6.0 Release Update.

Ok, next file. Let’s have a look.

[root@odax7-2m trace]# tail DOMEADD_mmon_82354.trc
AUTO SGA: kmgs_parameter_update_timeout gen0 0 mmon alive 1

*** 2019-03-15T09:29:55.474098+01:00
AUTO SGA: kmgs_parameter_update_timeout gen0 0 mmon alive 1

*** 2019-03-15T09:29:58.474092+01:00
AUTO SGA: kmgs_parameter_update_timeout gen0 0 mmon alive 1

Again MOS has the answer: AUTO SGA: kmgs_parameter_update_timeout gen0 0 mmon alive 1 Excessive Trace Files were generated on one node after Migration to 12.2 and MMON trace file grows (Doc ID 2298766.1). At least there are patches for that this time.

Since patching would have required a downtime and we might fix only one of two issues, we decided against it. Instead, we created a simple but effective workaround:

for i in $(find /u01/app/oracle/diag/rdbms -type f  -name "*mmon*tr*"); do echo "" > $i; done
for i in $(find /u01/app/oracle/diag/rdbms -type f  -name "*ipc0*tr*"); do echo "" > $i; done

Once this was run, and we now do that regularly, the filesystem usage dropped immediatly. This is what Enterprise Manager showed right after running the workaround.

adr-growth-02

Adding VLANs on an ODA

The Oracle Database Appliances X7-2 come only with one redundant network interface, so it is not possible to separate network traffic between distinct fault-tolerant bonded interfaces. But at least we can use VLANs on the one and only bonded interface that all the ODA X7-2 provide. If you choose to use VLANs, this choice must be taken at the point of installing the ODA. There is no easy way to change that afterwards. So if you use VLANs, you do it right from the beginning when you do the “configure-firstnet”. Once the setup is finished, it will end up in a configuration simiar to this.

[root@odax7-2m ~]# odacli list-networks

ID                                       Name                 NIC        InterfaceType IP Address         Subnet Mask        Gateway            VlanId
---------------------------------------- -------------------- ---------- ---------- ------------------ ------------------ ------------------ ----------
32812bc3-5bb7-4f8e-9070-b011acde665d     Private-network      priv0      DUMMY      192.168.16.24      255.255.255.240
735e2695-d68a-4696-a596-ba39a7180de6     Public-network       btbond1.26 Vlan       172.17.11.165      255.255.255.0      172.17.11.10       26

Now we can add more networks. The additional networks must be added from the operating system point of view which is done by the appliance management “odacli”, and from the Grid Infrastructure point of view which in turn is done by using “srvctl”. Finally, the databases must be configured to use the propper VLANs for their communication. Let’s walk through these steps.

1. Create a new VLAN

[root@odax7-2m ~]# odacli create-network -no-d -g 172.17.2.10 -n btbond1 -t VLAN -p 172.17.2.9 -m lsnr2 -w Database -s 255.255.255.0 -v 21
{
  "jobId" : "bd4d9ba0-7f59-491c-878b-5c03d3be51b6",
  "status" : "Created",
  "message" : null,
  "reports" : [ ],
  "createTimestamp" : "March 15, 2019 10:56:15 AM CET",
  "resourceList" : [ ],
  "description" : "Network service creation with names btbond1:lsnr2 ",
  "updatedTime" : "March 15, 2019 10:56:15 AM CET"
}

[root@odax7-2m ~]# odacli describe-job -i "bd4d9ba0-7f59-491c-878b-5c03d3be51b6"

Job details
----------------------------------------------------------------
                     ID:  bd4d9ba0-7f59-491c-878b-5c03d3be51b6
            Description:  Network service creation with names btbond1:lsnr2
                 Status:  Success
                Created:  March 15, 2019 10:56:15 AM CET
                Message:

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Setting network                          March 15, 2019 10:56:15 AM CET      March 15, 2019 10:56:20 AM CET      Success
Setting up Vlan                          March 15, 2019 10:56:15 AM CET      March 15, 2019 10:56:20 AM CET      Success

[root@odax7-2m ~]# odacli list-networks

ID                                       Name                 NIC        InterfaceType IP Address         Subnet Mask        Gateway            VlanId
---------------------------------------- -------------------- ---------- ---------- ------------------ ------------------ ------------------ ----------
32812bc3-5bb7-4f8e-9070-b011acde665d     Private-network      priv0      DUMMY      192.168.16.24      255.255.255.240
735e2695-d68a-4696-a596-ba39a7180de6     Public-network       btbond1.26 Vlan       172.17.11.165      255.255.255.0      172.17.11.10       26
e69efc7a-5973-4911-85c7-9d8e8b4138be     lsnr2                btbond1.21 VLAN       172.17.2.9         255.255.255.0      172.17.2.10        21

[root@odax7-2m ~]# odacli describe-network -i e69efc7a-5973-4911-85c7-9d8e8b4138be

Network details
----------------------------------------------------------------
                     ID:  e69efc7a-5973-4911-85c7-9d8e8b4138be
                   Name:  lsnr2
                    NIC:  btbond1.21
          InterfaceType:  VLAN
             IP Address:  172.17.2.9
            Subnet Mask:  255.255.255.0
                Gateway:  172.17.2.10
                 VlanId:  21
                   Type:  Database
                Default:  false
                Created:  March 15, 2019 10:56:15 AM CET

[root@odax7-2m ~]# odaadmcli show vlan
        NAME                     ID    INTERFACE   CONFIG_TYPE IP_ADDRESS      NETMASK         GATEWAY         NODENUM
        lsnr2                    21    btbond1     Database    172.17.2.9      255.255.255.0   172.17.2.10     0
        vlan26                   26    btbond1     public      172.17.11.165   255.255.255.0   172.17.11.10    0

[root@odax7-2m ~]# ifconfig btbond1.21
btbond1.21 Link encap:Ethernet  HWaddr 00:10:E0:DE:6F:CB
          inet addr:172.17.2.9  Bcast:172.17.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:501 errors:0 dropped:0 overruns:0 frame:0
          TX packets:13 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:35567 (34.7 KiB)  TX bytes:720 (720.0 b)

2. Add the VLAN to the Grid Infrastructure
Note, you must set the environment variables to point to the Grid Infrastructure. We have a toolbox for Linux systems, that makes it easy to set and maintain different environment settings. This toolbox provides a function to set the propper Oracle environment which is called “soe” –  set oracle environment. You’ll see it’s usage in the following step.

First, we add the new VLAN as a network resource.

[root@odax7-2m ~]# soe +ASM

Environment :
===================

Setting         Value
--------------- ---------------------------
NLS_DATE_FORMAT "YYYY-MM-DD HH24:MI:SS"
NLS_LANG        "AMERICAN_GERMANY.AL32UTF8"
ORACLE_BASE     /u01/app/oracle
ORACLE_HOME     /u01/app/12.2.0.1/oracle
ORACLE_SID      +ASM1
ORACLE_UNQNAME  +ASM

[root@odax7-2m ~]# srvctl add network -netnum 21 -subnet 172.17.2.0/255.255.255.0/btbond1.21 -nettype static

[root@odax7-2m ~]# srvctl config network
Network 1 exists
Subnet IPv4: 172.17.11.0/255.255.255.0/btbond1.26, static
Subnet IPv6:
Ping Targets:
Network is enabled
Network is individually enabled on nodes:
Network is individually disabled on nodes:
Network 21 exists
Subnet IPv4: 172.17.2.0/255.255.255.0/btbond1.21, static
Subnet IPv6:
Ping Targets:
Network is enabled
Network is individually enabled on nodes:
Network is individually disabled on nodes:
Network 28 exists
Subnet IPv4: 172.17.5.0/255.255.255.0/btbond1.28, static
Subnet IPv6:
Ping Targets:
Network is enabled
Network is individually enabled on nodes:
Network is individually disabled on nodes:

Second, we definde the VIP and the listener. Please be careful, which OS user you use for which command. Have a look at the command prompt.

[root@odax7-2m ~]# srvctl add vip -node odax7-2m -address odax7-2m-lsnr2/255.255.255.0 -netnum 21
[root@odax7-2m ~]# su - oracle
[oracle@odax7-2m ~]$ soe +ASM

Environment :
===================

Setting         Value
--------------- ---------------------------
NLS_DATE_FORMAT "YYYY-MM-DD HH24:MI:SS"
NLS_LANG        "AMERICAN_GERMANY.AL32UTF8"
ORACLE_BASE     /u01/app/oracle
ORACLE_HOME     /u01/app/12.2.0.1/oracle
ORACLE_SID      +ASM1
ORACLE_UNQNAME  +ASM

[oracle@odax7-2m ~]$ srvctl add listener -netnum 21 -endpoints "TCP:1523" -listener listener2
[oracle@odax7-2m ~]$ srvctl start listener -listener listener2
[oracle@odax7-2m ~]$ lsnrctl status listener2

LSNRCTL for Linux: Version 12.2.0.1.0 - Production on 15-MAR-2019 11:10:26

Copyright (c) 1991, 2016, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER2)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER2
Version                   TNSLSNR for Linux: Version 12.2.0.1.0 - Production
Start Date                15-MAR-2019 11:10:03
Uptime                    0 days 0 hr. 0 min. 22 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/12.2.0.1/oracle/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/odax7-2m/listener2/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER2)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.17.2.9)(PORT=1523)))
The listener supports no services
The command completed successfully

3. Point a database to the new Listener
To make use of the new listener, the database(s) must register with that listener. So we simply set the “local_listener” parameter to point to the new listener and we are done.

[oracle@odax7-2m ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on MAR-15-2019 11:11:58 

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics,
Real Application Testing and Unified Auditing options

SQL> alter system set local_listener='odax7-2m-lsnr2:1523';

System altered.

SQL> exit

[oracle@odax7-2m ~]$ soe +ASM -s
[oracle@odax7-2m ~]$ lsnrctl status listener2

LSNRCTL for Linux: Version 12.2.0.1.0 - Production on 15-MAR-2019 11:15:23

Copyright (c) 1991, 2016, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER2)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER2
Version                   TNSLSNR for Linux: Version 12.2.0.1.0 - Production
Start Date                15-MAR-2019 11:10:03
Uptime                    0 days 0 hr. 5 min. 19 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/12.2.0.1/oracle/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/odax7-2m/listener2/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER2)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.17.2.9)(PORT=1523)))
Service "MMI.support.robotron.de" has 1 instance(s).
  Instance "MMI", status READY, has 2 handler(s) for this service...
Service "MMIXDB.support.robotron.de" has 1 instance(s).
  Instance "MMI", status READY, has 2 handler(s) for this service...
The command completed successfully

That’s it, it’s as simple as that. 

Instance Caging

Last week we had a funny issue at a customers ODA X7-2M. You may know, that there are different database shapes, each having different settings for CPU_COUNT, memory etc. The documentation is quite clear about that. In this special case they had a database with shape “odb4” which sets CPU_COUNT to 4. Now they went live with their appication on that database and experienced the following behaviour.
cpu_count_instance_caging
You see the CPU limit of and you can also see, that the load goes well over that limit. But why? The answer is quite simple. Setting CPU_COUNT does not neccessarily limit the number of CPUs the database will use. An active Resource Manager Plan is required to achieve that. MOS Note “Why are a Higher Number of CPUs Used Than the CPU_COUNT Parameter Setting (Doc ID 2017395.1)” is explainig that. If you wish to limit the CPU usage, you will have to use Instance Caging which is a combination of setting CPU_COUNT and using Resource Manager. There is another MOS Note “Configuring and Monitoring Instance Caging (Doc ID 1362445.1)” for that.
That means, if you want your databases on ODA be limited to the CPU usage their shape defines, you need to activate a Resource Manager Plan, at least the “DEFAULT_PLAN”. This is not the default, keep that in mind.

Huge Oracle Database traces caused by EM Agent

Recently, I found some huge tracefiles in the ADR of some databases. Look at this:

[oracle@server01b trace]$ ls -lrth | tail -10
-rw-r----- 1 oracle dba 1.2K Apr 9 03:44 ICM01_vktm_29023.trc
-rw-r----- 1 oracle dba 329K Apr 9 06:36 ICM01_dbrm_29047.trm
-rw-r----- 1 oracle dba 4.7M Apr 9 06:36 ICM01_dbrm_29047.trc
-rw-r----- 1 oracle dba 1.5K Apr 9 07:58 ICM01_mmon_29120.trm
-rw-r----- 1 oracle dba 14K Apr 9 07:58 ICM01_mmon_29120.trc
-rw-r----- 1 oracle dba 471M Apr 9 08:02 ICM01_ora_29162.trm
-rw-r----- 1 oracle dba 33G Apr 9 08:02 ICM01_ora_29162.trc
-rw-r----- 1 oracle dba 1.3M Apr 9 08:02 alert_ICM01.log
-rw-r----- 1 oracle dba 12M Apr 9 08:02 ICM01_lmhb_29080.trm
-rw-r----- 1 oracle dba 112M Apr 9 08:02 ICM01_lmhb_29080.trc

The file(s) were not purged automatically from ADR since they simply don’t get older. So I had a look into the file and I found repeating content like this:

Trace file /u01/app/oracle/diag/rdbms/icm01a/ICM01/trace/ICM01_ora_29162.trc
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics
and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/12.1.0.2/dbhome_5
System name: Linux
Node name: server01b
Release: 4.1.12-112.16.8.el6uek.x86_64
Version: #2 SMP Fri Apr 13 15:13:46 PDT 2018
Machine: x86_64
Instance name: ICM01
Redo thread mounted by this instance: 1
Oracle process number: 35
Unix process pid: 29162, image: oracle@jus-oda-lrz01b

*** 2019-03-15 14:28:19.983
*** SESSION ID:(464.27491) 2019-03-15 14:28:19.983
*** CLIENT ID:() 2019-03-15 14:28:19.983
*** SERVICE NAME:(SYS$USERS) 2019-03-15 14:28:19.983
*** MODULE NAME:(emagent_SQL_oracle_database) 2019-03-15 14:28:19.983
*** CLIENT DRIVER:(jdbcthin) 2019-03-15 14:28:19.983
*** ACTION NAME:(problemTbspUndo) 2019-03-15 14:28:19.983

psdgbt: bind csid (1) does not match session csid (873)
psdgbt: session charset is AL32UTF8

*** 2019-03-15 14:28:19.983
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=1, mask=0x0)
----- Error Stack Dump -----
----- Current SQL Statement for this session (sql_id=a0qc12302fzfk) -----
begin dbms_application_info.set_module(:1 , :2 ); end;

There was some more callstack dumped etc., but this was not relevant to identify the root cause for this. Obviously the Enterprise Manager Agent is doing something nasty to the database. There are several documents in MOS, the best match of them is “After restart of database suddenly PLS-00553: character set name is not recognized are seen in the alert.log (Doc ID 1599864.1)“. Basically, the Agent connects too early to a starting database and picks up US7ASCII characterset. The only solution as of now is to bounce the Agent.
As a conclusion, it might be a good idea to bounce the Agent every time a database was started. May it be because the database was newly created, restarted due to maintenance or in a Data Guard switch-/failover scenario. Or just do it on a regular basis so no one has to do it by hand.

New Size Based ADR Purge Policy in 12.2

With the introduction of the Automatic Diagnostic Repository (ADR) we got some automatic purging of logs and traces too, at least for the database. This purging is based on the age of files. All the files in ADR are put in two categories, files with a short life and files with a long life. So we also have two retention policies for these files, the SHORTP_POLICY and the LONGP_POLICY. The SHORTP_POLICY defaults to 720 hours which is roughly a month whilst the LONGP_POLICY defaults to 8760 hours which is roughly a year. You can read more on setting the policies in MOS Note “Retention Policy for ADR (Doc ID 564269.1)“. If you are unsure which files are controlled by which policy, have a look at MOS Note “Which Files Are Part Of SHORTP_POLICY And LONGP_POLICY In ADR? (Doc ID 975448.1)“, basically traces, core dumps and IPS have a short life, all others have a long life.

This is a good starting point for housekeeping, but there might be situations were a lot of large traces are written in a short amount of time. This might fill up you filesystem leading to a database outage. Now with Oracle Database 12.2 and onwards, there is another size based policy. This policy is not immediatly visible:

[oracle@odax7-2m rdbms]$ adrci

ADRCI: Release 12.2.0.1.0 - Production on Fri Mar 15 08:07:49 2019

Copyright (c) 1982, 2017, Oracle and/or its affiliates.  All rights reserved.

ADR base = "/u01/app/oracle"
adrci> set home mmidb
adrci> show homes
ADR Homes:
diag/rdbms/mmidb/MMIDB

adrci>  show control

ADR Home = /u01/app/oracle/diag/rdbms/mmidb/MMIDB:
*************************************************************************
ADRID                SHORTP_POLICY        LONGP_POLICY         LAST_MOD_TIME                            LAST_AUTOPRG_TIME                        LAST_MANUPRG_TIME                        ADRDIR_VERSION       ADRSCHM_VERSION      ADRSCHMV_SUMMARY     ADRALERT_VERSION     CREATE_TIME
-------------------- -------------------- -------------------- ---------------------------------------- ---------------------------------------- ---------------------------------------- -------------------- -------------------- -------------------- -------------------- ----------------------------------------
1312824711           72                   72                   2019-03-14 22:37:18.660133 +01:00        2019-03-14 09:13:26.445071 +01:00        2019-03-15 07:58:30.921596 +01:00        1                    2                    107                  1                    2018-10-02 09:11:29.809205 +02:00
1 row fetched

To see the size based policy (SIZEP_POLICY), we need to query it explicitly:

adrci> select sizep_policy from adr_control_aux;

ADR Home = /u01/app/oracle/diag/rdbms/mmidb/MMIDB:
*************************************************************************
SIZEP_POLICY
--------------------
18446744073709551615
1 row fetched

This number represents the maximum size of the specific ADR home in bytes. To get an idea, how much space is required to keep files for a given amount of time or vice versa, we can use the “estimate” command.

adrci> estimate (LONGP_POLICY = 72)
Estimate
Short Policy Hours: 72
Long Policy Hours: 72
Size Policy Bytes: 2459925363

adrci> estimate (SIZEP_POLICY = 1000000000)
Estimate
Short Policy Hours: 26
Long Policy Hours: 26
Size Policy Bytes: 1000000000

So if we want to limit the size to 1GB, we can set the SIZEP_POLICY accordingly.

adrci> set control (SIZEP_POLICY = 1000000000)

adrci> select sizep_policy from adr_control_aux;

ADR Home = /u01/app/oracle/diag/rdbms/mmidb/MMIDB:
*************************************************************************
SIZEP_POLICY
--------------------
1000000000
1 row fetched

But remember, this does not take immediate effect. The database is the only facility, that does automatic purging. All other ADR homes from other facilities need explicit purging via the “purge” command. And also the database does the purging only once a week and there is no way to change that, see MOS Note “Is there a way to control Auto_Purge Frequency done by the MMON ? (Doc ID 1446242.1)” for details. So you should consider using a separate purge job for that. The “purgeLogs” script (MOS Note “purgeLogs: Cleanup traces, logs in one command (Doc ID 2081655.1)“) might be a good starting point for that, even though it was designed to be used on ExaData, Database Appliance etc., it should also do it’s job on standard installations.