Adding VLANs on an ODA

The Oracle Database Appliances X7-2 come only with one redundant network interface, so it is not possible to separate network traffic between distinct fault-tolerant bonded interfaces. But at least we can use VLANs on the one and only bonded interface that all the ODA X7-2 provide. If you choose to use VLANs, this choice must be taken at the point of installing the ODA. There is no easy way to change that afterwards. So if you use VLANs, you do it right from the beginning when you do the “configure-firstnet”. Once the setup is finished, it will end up in a configuration simiar to this.

[root@odax7-2m ~]# odacli list-networks

ID                                       Name                 NIC        InterfaceType IP Address         Subnet Mask        Gateway            VlanId
---------------------------------------- -------------------- ---------- ---------- ------------------ ------------------ ------------------ ----------
32812bc3-5bb7-4f8e-9070-b011acde665d     Private-network      priv0      DUMMY      192.168.16.24      255.255.255.240
735e2695-d68a-4696-a596-ba39a7180de6     Public-network       btbond1.26 Vlan       172.17.11.165      255.255.255.0      172.17.11.10       26

Now we can add more networks. The additional networks must be added from the operating system point of view which is done by the appliance management “odacli”, and from the Grid Infrastructure point of view which in turn is done by using “srvctl”. Finally, the databases must be configured to use the propper VLANs for their communication. Let’s walk through these steps.

1. Create a new VLAN

[root@odax7-2m ~]# odacli create-network -no-d -g 172.17.2.10 -n btbond1 -t VLAN -p 172.17.2.9 -m lsnr2 -w Database -s 255.255.255.0 -v 21
{
  "jobId" : "bd4d9ba0-7f59-491c-878b-5c03d3be51b6",
  "status" : "Created",
  "message" : null,
  "reports" : [ ],
  "createTimestamp" : "March 15, 2019 10:56:15 AM CET",
  "resourceList" : [ ],
  "description" : "Network service creation with names btbond1:lsnr2 ",
  "updatedTime" : "March 15, 2019 10:56:15 AM CET"
}

[root@odax7-2m ~]# odacli describe-job -i "bd4d9ba0-7f59-491c-878b-5c03d3be51b6"

Job details
----------------------------------------------------------------
                     ID:  bd4d9ba0-7f59-491c-878b-5c03d3be51b6
            Description:  Network service creation with names btbond1:lsnr2
                 Status:  Success
                Created:  March 15, 2019 10:56:15 AM CET
                Message:

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Setting network                          March 15, 2019 10:56:15 AM CET      March 15, 2019 10:56:20 AM CET      Success
Setting up Vlan                          March 15, 2019 10:56:15 AM CET      March 15, 2019 10:56:20 AM CET      Success

[root@odax7-2m ~]# odacli list-networks

ID                                       Name                 NIC        InterfaceType IP Address         Subnet Mask        Gateway            VlanId
---------------------------------------- -------------------- ---------- ---------- ------------------ ------------------ ------------------ ----------
32812bc3-5bb7-4f8e-9070-b011acde665d     Private-network      priv0      DUMMY      192.168.16.24      255.255.255.240
735e2695-d68a-4696-a596-ba39a7180de6     Public-network       btbond1.26 Vlan       172.17.11.165      255.255.255.0      172.17.11.10       26
e69efc7a-5973-4911-85c7-9d8e8b4138be     lsnr2                btbond1.21 VLAN       172.17.2.9         255.255.255.0      172.17.2.10        21

[root@odax7-2m ~]# odacli describe-network -i e69efc7a-5973-4911-85c7-9d8e8b4138be

Network details
----------------------------------------------------------------
                     ID:  e69efc7a-5973-4911-85c7-9d8e8b4138be
                   Name:  lsnr2
                    NIC:  btbond1.21
          InterfaceType:  VLAN
             IP Address:  172.17.2.9
            Subnet Mask:  255.255.255.0
                Gateway:  172.17.2.10
                 VlanId:  21
                   Type:  Database
                Default:  false
                Created:  March 15, 2019 10:56:15 AM CET

[root@odax7-2m ~]# odaadmcli show vlan
        NAME                     ID    INTERFACE   CONFIG_TYPE IP_ADDRESS      NETMASK         GATEWAY         NODENUM
        lsnr2                    21    btbond1     Database    172.17.2.9      255.255.255.0   172.17.2.10     0
        vlan26                   26    btbond1     public      172.17.11.165   255.255.255.0   172.17.11.10    0

[root@odax7-2m ~]# ifconfig btbond1.21
btbond1.21 Link encap:Ethernet  HWaddr 00:10:E0:DE:6F:CB
          inet addr:172.17.2.9  Bcast:172.17.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:501 errors:0 dropped:0 overruns:0 frame:0
          TX packets:13 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:35567 (34.7 KiB)  TX bytes:720 (720.0 b)

2. Add the VLAN to the Grid Infrastructure
Note, you must set the environment variables to point to the Grid Infrastructure. We have a toolbox for Linux systems, that makes it easy to set and maintain different environment settings. This toolbox provides a function to set the propper Oracle environment which is called “soe” –  set oracle environment. You’ll see it’s usage in the following step.

First, we add the new VLAN as a network resource.

[root@odax7-2m ~]# soe +ASM

Environment :
===================

Setting         Value
--------------- ---------------------------
NLS_DATE_FORMAT "YYYY-MM-DD HH24:MI:SS"
NLS_LANG        "AMERICAN_GERMANY.AL32UTF8"
ORACLE_BASE     /u01/app/oracle
ORACLE_HOME     /u01/app/12.2.0.1/oracle
ORACLE_SID      +ASM1
ORACLE_UNQNAME  +ASM

[root@odax7-2m ~]# srvctl add network -netnum 21 -subnet 172.17.2.0/255.255.255.0/btbond1.21 -nettype static

[root@odax7-2m ~]# srvctl config network
Network 1 exists
Subnet IPv4: 172.17.11.0/255.255.255.0/btbond1.26, static
Subnet IPv6:
Ping Targets:
Network is enabled
Network is individually enabled on nodes:
Network is individually disabled on nodes:
Network 21 exists
Subnet IPv4: 172.17.2.0/255.255.255.0/btbond1.21, static
Subnet IPv6:
Ping Targets:
Network is enabled
Network is individually enabled on nodes:
Network is individually disabled on nodes:
Network 28 exists
Subnet IPv4: 172.17.5.0/255.255.255.0/btbond1.28, static
Subnet IPv6:
Ping Targets:
Network is enabled
Network is individually enabled on nodes:
Network is individually disabled on nodes:

Second, we definde the VIP and the listener. Please be careful, which OS user you use for which command. Have a look at the command prompt.

[root@odax7-2m ~]# srvctl add vip -node odax7-2m -address odax7-2m-lsnr2/255.255.255.0 -netnum 21
[root@odax7-2m ~]# su - oracle
[oracle@odax7-2m ~]$ soe +ASM

Environment :
===================

Setting         Value
--------------- ---------------------------
NLS_DATE_FORMAT "YYYY-MM-DD HH24:MI:SS"
NLS_LANG        "AMERICAN_GERMANY.AL32UTF8"
ORACLE_BASE     /u01/app/oracle
ORACLE_HOME     /u01/app/12.2.0.1/oracle
ORACLE_SID      +ASM1
ORACLE_UNQNAME  +ASM

[oracle@odax7-2m ~]$ srvctl add listener -netnum 21 -endpoints "TCP:1523" -listener listener2
[oracle@odax7-2m ~]$ srvctl start listener -listener listener2
[oracle@odax7-2m ~]$ lsnrctl status listener2

LSNRCTL for Linux: Version 12.2.0.1.0 - Production on 15-MAR-2019 11:10:26

Copyright (c) 1991, 2016, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER2)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER2
Version                   TNSLSNR for Linux: Version 12.2.0.1.0 - Production
Start Date                15-MAR-2019 11:10:03
Uptime                    0 days 0 hr. 0 min. 22 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/12.2.0.1/oracle/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/odax7-2m/listener2/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER2)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.17.2.9)(PORT=1523)))
The listener supports no services
The command completed successfully

3. Point a database to the new Listener
To make use of the new listener, the database(s) must register with that listener. So we simply set the “local_listener” parameter to point to the new listener and we are done.

[oracle@odax7-2m ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on MAR-15-2019 11:11:58 

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics,
Real Application Testing and Unified Auditing options

SQL> alter system set local_listener='odax7-2m-lsnr2:1523';

System altered.

SQL> exit

[oracle@odax7-2m ~]$ soe +ASM -s
[oracle@odax7-2m ~]$ lsnrctl status listener2

LSNRCTL for Linux: Version 12.2.0.1.0 - Production on 15-MAR-2019 11:15:23

Copyright (c) 1991, 2016, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER2)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER2
Version                   TNSLSNR for Linux: Version 12.2.0.1.0 - Production
Start Date                15-MAR-2019 11:10:03
Uptime                    0 days 0 hr. 5 min. 19 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/12.2.0.1/oracle/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/odax7-2m/listener2/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER2)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.17.2.9)(PORT=1523)))
Service "MMI.support.robotron.de" has 1 instance(s).
  Instance "MMI", status READY, has 2 handler(s) for this service...
Service "MMIXDB.support.robotron.de" has 1 instance(s).
  Instance "MMI", status READY, has 2 handler(s) for this service...
The command completed successfully

That’s it, it’s as simple as that. 

Advertisements

Instance Caging

Last week we had a funny issue at a customers ODA X7-2M. You may know, that there are different database shapes, each having different settings for CPU_COUNT, memory etc. The documentation is quite clear about that. In this special case they had a database with shape “odb4” which sets CPU_COUNT to 4. Now they went live with their appication on that database and experienced the following behaviour.
cpu_count_instance_caging
You see the CPU limit of and you can also see, that the load goes well over that limit. But why? The answer is quite simple. Setting CPU_COUNT does not neccessarily limit the number of CPUs the database will use. An active Resource Manager Plan is required to achieve that. MOS Note “Why are a Higher Number of CPUs Used Than the CPU_COUNT Parameter Setting (Doc ID 2017395.1)” is explainig that. If you wish to limit the CPU usage, you will have to use Instance Caging which is a combination of setting CPU_COUNT and using Resource Manager. There is another MOS Note “Configuring and Monitoring Instance Caging (Doc ID 1362445.1)” for that.
That means, if you want your databases on ODA be limited to the CPU usage their shape defines, you need to activate a Resource Manager Plan, at least the “DEFAULT_PLAN”. This is not the default, keep that in mind.

Huge Oracle Database traces caused by EM Agent

Recently, I found some huge tracefiles in the ADR of some databases. Look at this:

[oracle@server01b trace]$ ls -lrth | tail -10
-rw-r----- 1 oracle dba 1.2K Apr 9 03:44 ICM01_vktm_29023.trc
-rw-r----- 1 oracle dba 329K Apr 9 06:36 ICM01_dbrm_29047.trm
-rw-r----- 1 oracle dba 4.7M Apr 9 06:36 ICM01_dbrm_29047.trc
-rw-r----- 1 oracle dba 1.5K Apr 9 07:58 ICM01_mmon_29120.trm
-rw-r----- 1 oracle dba 14K Apr 9 07:58 ICM01_mmon_29120.trc
-rw-r----- 1 oracle dba 471M Apr 9 08:02 ICM01_ora_29162.trm
-rw-r----- 1 oracle dba 33G Apr 9 08:02 ICM01_ora_29162.trc
-rw-r----- 1 oracle dba 1.3M Apr 9 08:02 alert_ICM01.log
-rw-r----- 1 oracle dba 12M Apr 9 08:02 ICM01_lmhb_29080.trm
-rw-r----- 1 oracle dba 112M Apr 9 08:02 ICM01_lmhb_29080.trc

The file(s) were not purged automatically from ADR since they simply don’t get older. So I had a look into the file and I found repeating content like this:

Trace file /u01/app/oracle/diag/rdbms/icm01a/ICM01/trace/ICM01_ora_29162.trc
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics
and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/12.1.0.2/dbhome_5
System name: Linux
Node name: server01b
Release: 4.1.12-112.16.8.el6uek.x86_64
Version: #2 SMP Fri Apr 13 15:13:46 PDT 2018
Machine: x86_64
Instance name: ICM01
Redo thread mounted by this instance: 1
Oracle process number: 35
Unix process pid: 29162, image: oracle@jus-oda-lrz01b

*** 2019-03-15 14:28:19.983
*** SESSION ID:(464.27491) 2019-03-15 14:28:19.983
*** CLIENT ID:() 2019-03-15 14:28:19.983
*** SERVICE NAME:(SYS$USERS) 2019-03-15 14:28:19.983
*** MODULE NAME:(emagent_SQL_oracle_database) 2019-03-15 14:28:19.983
*** CLIENT DRIVER:(jdbcthin) 2019-03-15 14:28:19.983
*** ACTION NAME:(problemTbspUndo) 2019-03-15 14:28:19.983

psdgbt: bind csid (1) does not match session csid (873)
psdgbt: session charset is AL32UTF8

*** 2019-03-15 14:28:19.983
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=1, mask=0x0)
----- Error Stack Dump -----
----- Current SQL Statement for this session (sql_id=a0qc12302fzfk) -----
begin dbms_application_info.set_module(:1 , :2 ); end;

There was some more callstack dumped etc., but this was not relevant to identify the root cause for this. Obviously the Enterprise Manager Agent is doing something nasty to the database. There are several documents in MOS, the best match of them is “After restart of database suddenly PLS-00553: character set name is not recognized are seen in the alert.log (Doc ID 1599864.1)“. Basically, the Agent connects too early to a starting database and picks up US7ASCII characterset. The only solution as of now is to bounce the Agent.
As a conclusion, it might be a good idea to bounce the Agent every time a database was started. May it be because the database was newly created, restarted due to maintenance or in a Data Guard switch-/failover scenario. Or just do it on a regular basis so no one has to do it by hand.

New Size Based ADR Purge Policy in 12.2

With the introduction of the Automatic Diagnostic Repository (ADR) we got some automatic purging of logs and traces too, at least for the database. This purging is based on the age of files. All the files in ADR are put in two categories, files with a short life and files with a long life. So we also have two retention policies for these files, the SHORTP_POLICY and the LONGP_POLICY. The SHORTP_POLICY defaults to 720 hours which is roughly a month whilst the LONGP_POLICY defaults to 8760 hours which is roughly a year. You can read more on setting the policies in MOS Note “Retention Policy for ADR (Doc ID 564269.1)“. If you are unsure which files are controlled by which policy, have a look at MOS Note “Which Files Are Part Of SHORTP_POLICY And LONGP_POLICY In ADR? (Doc ID 975448.1)“, basically traces, core dumps and IPS have a short life, all others have a long life.

This is a good starting point for housekeeping, but there might be situations were a lot of large traces are written in a short amount of time. This might fill up you filesystem leading to a database outage. Now with Oracle Database 12.2 and onwards, there is another size based policy. This policy is not immediatly visible:

[oracle@odax7-2m rdbms]$ adrci

ADRCI: Release 12.2.0.1.0 - Production on Fri Mar 15 08:07:49 2019

Copyright (c) 1982, 2017, Oracle and/or its affiliates.  All rights reserved.

ADR base = "/u01/app/oracle"
adrci> set home mmidb
adrci> show homes
ADR Homes:
diag/rdbms/mmidb/MMIDB

adrci>  show control

ADR Home = /u01/app/oracle/diag/rdbms/mmidb/MMIDB:
*************************************************************************
ADRID                SHORTP_POLICY        LONGP_POLICY         LAST_MOD_TIME                            LAST_AUTOPRG_TIME                        LAST_MANUPRG_TIME                        ADRDIR_VERSION       ADRSCHM_VERSION      ADRSCHMV_SUMMARY     ADRALERT_VERSION     CREATE_TIME
-------------------- -------------------- -------------------- ---------------------------------------- ---------------------------------------- ---------------------------------------- -------------------- -------------------- -------------------- -------------------- ----------------------------------------
1312824711           72                   72                   2019-03-14 22:37:18.660133 +01:00        2019-03-14 09:13:26.445071 +01:00        2019-03-15 07:58:30.921596 +01:00        1                    2                    107                  1                    2018-10-02 09:11:29.809205 +02:00
1 row fetched

To see the size based policy (SIZEP_POLICY), we need to query it explicitly:

adrci> select sizep_policy from adr_control_aux;

ADR Home = /u01/app/oracle/diag/rdbms/mmidb/MMIDB:
*************************************************************************
SIZEP_POLICY
--------------------
18446744073709551615
1 row fetched

This number represents the maximum size of the specific ADR home in bytes. To get an idea, how much space is required to keep files for a given amount of time or vice versa, we can use the “estimate” command.

adrci> estimate (LONGP_POLICY = 72)
Estimate
Short Policy Hours: 72
Long Policy Hours: 72
Size Policy Bytes: 2459925363

adrci> estimate (SIZEP_POLICY = 1000000000)
Estimate
Short Policy Hours: 26
Long Policy Hours: 26
Size Policy Bytes: 1000000000

So if we want to limit the size to 1GB, we can set the SIZEP_POLICY accordingly.

adrci> set control (SIZEP_POLICY = 1000000000)

adrci> select sizep_policy from adr_control_aux;

ADR Home = /u01/app/oracle/diag/rdbms/mmidb/MMIDB:
*************************************************************************
SIZEP_POLICY
--------------------
1000000000
1 row fetched

But remember, this does not take immediate effect. The database is the only facility, that does automatic purging. All other ADR homes from other facilities need explicit purging via the “purge” command. And also the database does the purging only once a week and there is no way to change that, see MOS Note “Is there a way to control Auto_Purge Frequency done by the MMON ? (Doc ID 1446242.1)” for details. So you should consider using a separate purge job for that. The “purgeLogs” script (MOS Note “purgeLogs: Cleanup traces, logs in one command (Doc ID 2081655.1)“) might be a good starting point for that, even though it was designed to be used on ExaData, Database Appliance etc., it should also do it’s job on standard installations.