Merge vs. Insert-Exception-Update

Recently I had a discussion with one of our developer about how to bring new rows in a table or, if the row already exists, update it. His approach was a PL/SQL procedure that first tries to insert a single row and if that fails because the row is already there, catch that exception and do an update instead. I told him, that this approach has at least three issues, that make it slow. First, it does things row-by-row, this is much more inefficient than set operations and it does not scale well. Second, it has lots of context switches between SQL and PL/SQL which are costly. And third, the exception handling is expensive too. My advise was to switch the code to use a single MERGE-statement. And to prove that I am right, I wrote a small example which I’ll share with you.

Let’s create two tables with identical structure, both will have an unique index and a trigger, that fills the ID using a sequence.

SQL> create unique index tm_txt_uq on test_merge(txt);

Index created.

SQL> create table test_except (
  2   id  number,
  3   txt varchar2(30),
  4   last_updated date,
  5   constraint pk_test_except primary key (id)
  6  )
  7  /

Table created.

SQL> create unique index te_txt_uq on test_except(txt);

Index created.

SQL> create sequence tm_seq;

Sequence created.

SQL> create sequence te_seq;

Sequence created.

SQL> create or replace trigger tm_bir
  2  before insert on test_merge
  3  for each row
  4  begin
  5    :new.id := tm_seq.nextval;
  6  end;
  7  /

Trigger created.

SQL> create or replace trigger te_bir
  2  before insert on test_except
  3  for each row
  4  begin
  5    :new.id := te_seq.nextval;
  6  end;
  7  /

Trigger created.

In the first run I populated both tables with 100k rows, since both tables are empty, this will lead to INSERTs of cause. And SQL*Plus will do the timing for me.

SQL> set timing on
SQL> merge into test_merge
  2  using (
  3    select 'Zeile ' || to_char(level) zeile from dual connect by level  set serveroutput on
SQL> declare
  2    l number;
  3    i number :=0;
  4    u number :=0;
  5  begin
  6    for l in 1..100000 loop
  7      begin
  8        insert into test_except (txt)
  9    values ('Zeile ' || to_char(l));
 10    i := i+1;
 11  exception
 12    when dup_val_on_index then
 13      update test_except set last_updated=sysdate where txt = 'Zeile ' || to_char(l);
 14      u := u+1;
 15      end;
 16    end loop;
 17    dbms_output.put_line(to_char(i) || ' rows inserted.');
 18    dbms_output.put_line(to_char(u) || ' rows updated.');
 19  end;
 20  /
100000 rows inserted.
0 rows updated.

PL/SQL procedure successfully completed.

Elapsed: 00:00:14.62

We see, that the MERGE is more than 2x faster than the PL/SQL approach. When I now run both statements a scond time, it will lead to UPDATEs since all the rows are already in the tables.

SQL> merge into test_merge
  2  using (
  3    select 'Zeile ' || to_char(level) zeile from dual connect by level  declare
  2    l number;
  3    i number :=0;
  4    u number :=0;
  5  begin
  6    for l in 1..100000 loop
  7      begin
  8        insert into test_except (txt)
  9    values ('Zeile ' || to_char(l));
 10    i := i+1;
 11      exception
 12    when dup_val_on_index then
 13      update test_except set last_updated=sysdate where txt = 'Zeile ' || to_char(l);
 14      u := u+1;
 15      end;
 16    end loop;
 17    dbms_output.put_line(to_char(i) || ' rows inserted.');
 18    dbms_output.put_line(to_char(u) || ' rows updated.');
 19  end;
 20  /
0 rows inserted.
100000 rows updated.

PL/SQL procedure successfully completed.

Elapsed: 00:00:55.36

The MERGE statement is even faster compared to the first run. That might be due to the fact, that there is less internal work to do, no index block splits etc. But the elapsed time of the PL/SQL aproach dicreased dramatically, it took 4x longer than it’s initial run and is now 11x slower then the MERGE.
So now it is really obvious, that the PL/SQL approach might not be the best performing solution. The MERGE solution maybe needs some more thinking and might not readable as the PL/SQL solution, but if done regularly one should get used to it.
The good thing is, I convinced my collegue.

Size of Flashback Logs

The todays blog post is about the Flashback Database feature inside the Oracle Database. I use this feature mostly in the context of Data Guard, but there are also other possible use cases like having a fallback during updates/upgrades etc. For Data Guard setups, the Flashback Database feature is simplyfies and speeds up a reinstantiation of a formerly primary database that got lost and a failover the standby database was performed. With Flashback Database there is no need to perform a complete duplication of the database to bring it up and running as a standby database. The database is simply rolled back to just before the SCN when the standby database was activated by the failover. It can then just start applying redo that it gets from the new primary database.
But how does it work? When activating this feature, the database writes some additional logs that cotain information to rollback the database. These logs are stored in the Fast Recovery Area and there is no way to influence that. Let’s have a look at that.

SQL> select group#, status, bytes from v$log;

    GROUP# STATUS                BYTES
---------- ---------------- ----------
         1 CURRENT          1073741824
         2 INACTIVE         1073741824
         3 INACTIVE         1073741824

SQL> alter database flashback on;

Database altered.

SQL> select * from v$flashback_database_logfile;

NAME                                                                                   LOG#    THREAD#  SEQUENCE#      BYTES FIRST_CHANGE# FIRST_TIME          TYPE          CON_ID
-------------------------------------------------------------------------------- ---------- ---------- ---------- ---------- ------------- ------------------- --------- ----------
/u03/app/oracle/fast_recovery_area/MMISI2/flashback/o1_mf_h1xp7lv0_.flb                   1          1          1 1073741824       3886676 2020-01-15 10:18:44 NORMAL             0
/u03/app/oracle/fast_recovery_area/MMISI2/flashback/o1_mf_h1xp7pnk_.flb                   2          1          1 1073741824             0                     RESERVED           0

SQL> !ls -l /u03/app/oracle/fast_recovery_area/MMISI2/flashback/
total 2097168
-rw-r----- 1 oracle dba 1073750016 Jan 15 10:22 o1_mf_h1xp7lv0_.flb
-rw-r----- 1 oracle dba 1073750016 Jan 15 10:18 o1_mf_h1xp7pnk_.flb

Obviously the logs are the same size as the redologs. And they consume space inside your FRA. But what if one does not have sufficient space and/or wants to shrink those flashback logs? Since the size is being inherited from the size of the redologs, let’s try shrinking the redologs by creating new groups and dropping the old ones.

SQL> alter database add logfile group 10 size 256m;

Database altered.

SQL> alter database add logfile group 11 size 256m;

Database altered.

SQL> alter database add logfile group 12 size 256m;

Database altered.

SQL> alter database drop logfile group 2;

Database altered.

SQL> alter database drop logfile group 3;

Database altered.

SQL> alter system switch logfile;

System altered.

SQL> alter system checkpoint;

System altered.

SQL> alter database drop logfile group 1;

Database altered.

SQL> select group#, bytes, status from v$log;

    GROUP#      BYTES STATUS
---------- ---------- ----------------
        10  268435456 CURRENT
        11  268435456 UNUSED
        12  268435456 UNUSED

Now let’s check how the flashback logs look like:

SQL> select log#, thread#, sequence#, bytes, type from v$flashback_database_logfile;

      LOG#    THREAD#  SEQUENCE#      BYTES TYPE
---------- ---------- ---------- ---------- ---------
         1          1          1 1073741824 NORMAL
         2          1          1 1073741824 RESERVED

Still unchanged. May it requires some work to be done to cycle through the logs.

SQL> create table dings tablespace users as select * from dba_source;

Table created.

SQL> insert into dings select * from dings;

283698 rows created.

SQL> insert into dings select * from dings;

567396 rows created.

SQL> insert into dings select * from dings;

1134792 rows created.

SQL> insert into dings select * from dings;

2269584 rows created.

SQL> insert into dings select * from dings;

4539168 rows created.

SQL> commit;

Commit complete.

SQL> select group#, sequence#, bytes, status from v$log;

    GROUP#  SEQUENCE#      BYTES STATUS
---------- ---------- ---------- ----------------
        10         11  268435456 ACTIVE
        11         12  268435456 ACTIVE
        12         13  268435456 CURRENT

SQL> select log#, thread#, sequence#, bytes, type from v$flashback_database_logfile;

      LOG#    THREAD#  SEQUENCE#      BYTES TYPE
---------- ---------- ---------- ---------- ---------
         1          1          1 1073741824 NORMAL
         2          1          1 1073741824 RESERVED

Ok, no change here. There isn’t even a new flashback log. Maybe it wasn’t enough change? So let’s try some row-by-row deletion. And please do not do this in production, deleting rows can’t be more inefficient as that.

SQL> begin
   2   for rec in (select rowid from dings) loop
   3     delete from dings where rowid = rec.rowid;
   4     commit;
   5   end loop;
   6 end;
   7 /

PL/SQL procedure successfully completed.

SQL> select group#, sequence#, bytes, status from v$log;

    GROUP#  SEQUENCE#      BYTES STATUS
---------- ---------- ---------- ----------------
        10         53  268435456 CURRENT
        11         51  268435456 INACTIVE
        12         52  268435456 INACTIVE

SQL> select log#, thread#, sequence#, bytes, type from v$flashback_database_logfile;

      LOG#    THREAD#  SEQUENCE#      BYTES TYPE
---------- ---------- ---------- ---------- ---------
         1          1          1 1073741824 NORMAL
         2          1          1 1073741824 RESERVED

At lease this produced quite some redologs, but no new flashback logs. Maybe the “db_flashback_retention_target” parameter has an impact on these logs?

SQL> show parameter flashb

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_flashback_retention_target        integer     1440

SQL> alter system set db_flashback_retention_target=10;

System altered.

SQL> select log#, thread#, sequence#, bytes, type from v$flashback_database_logfile;

      LOG#    THREAD#  SEQUENCE#      BYTES TYPE
---------- ---------- ---------- ---------- ---------
         1          1          1 1073741824 NORMAL
         2          1          1 1073741824 RESERVED

Simply speaking, no, it hasn’t. Maybe other operations influence the generation of flashback logs, like index maintenance?

SQL> insert into dings select * from dba_source;

283698 rows created.

SQL> commit;

Commit complete.

SQL> create index dings_ix1 on dings(owner, name, line) tablespace new;

Index created.

SQL> insert into dings select * from dings;

283698 rows created.

SQL> insert into dings select * from dings;

567396 rows created.

SQL> insert into dings select * from dings;

1134792 rows created.

SQL> insert into dings select * from dings;

2269584 rows created.

SQL> commit;

Commit complete.

SQL> select group#, sequence#, bytes, status from v$log;

    GROUP#  SEQUENCE#      BYTES STATUS
---------- ---------- ---------- ----------------
        10         65  268435456 ACTIVE
        11         66  268435456 ACTIVE
        12         67  268435456 CURRENT

SQL> select log#, thread#, sequence#, bytes, type, first_time from v$flashback_database_logfile;

      LOG#    THREAD#  SEQUENCE#      BYTES TYPE      FIRST_TIME
---------- ---------- ---------- ---------- --------- -------------------
         3          1          3 1073741824 NORMAL    2020-01-15 11:27:47
         4          1          1 1073741824 RESERVED
         1          1          1 1073741824 NORMAL    2020-01-15 10:18:44
         2          1          2 1073741824 NORMAL    2020-01-15 11:26:36

SQL> !ls -l /u03/app/oracle/fast_recovery_area/MMISI2/flashback/
total 4194336
-rw-r----- 1 oracle dba 1073750016 Jan 15 11:26 o1_mf_h1xp7lv0_.flb
-rw-r----- 1 oracle dba 1073750016 Jan 15 11:27 o1_mf_h1xp7pnk_.flb
-rw-r----- 1 oracle dba 1073750016 Jan 15 11:28 o1_mf_h1xt6z4y_.flb
-rw-r----- 1 oracle dba 1073750016 Jan 15 11:27 o1_mf_h1xt95vj_.flb

Ok, finally there are some new flashback logs. But the size of those logs is still at 1GB.
I am running out of ideas, the last chance is to disable and enable the flashback datbae feautre.

SQL> alter database flashback off;

Database altered.

SQL> alter database flashback on;

Database altered.

SQL> select log#, thread#, sequence#, bytes, type, first_time from v$flashback_database_logfile;

      LOG#    THREAD#  SEQUENCE#      BYTES TYPE      FIRST_TIME
---------- ---------- ---------- ---------- --------- -------------------
         1          1          1  268435456 NORMAL    2020-01-15 11:30:01
         2          1          1  268435456 RESERVED

SQL> !ls -l /u03/app/oracle/fast_recovery_area/MMISI2/flashback/
total 524304
-rw-r----- 1 oracle dba 268443648 Jan 15 11:30 o1_mf_h1xtf8yz_.flb
-rw-r----- 1 oracle dba 268443648 Jan 15 11:30 o1_mf_h1xtfck6_.flb

Here we go, the flashback logs are finally shrinked. But with the consequences of loosing all the flashback information that was previously there. Keep that in mind and schedule wisely when changing the size of your flashback logs.

Oracle Clusterware (19c) and SLES 12 certification

Recently we stumbled upon an issue with Oracle Clusterware on SLES 12 SP5. We checked the certification on MOS and found this statement: “SUSE Linux Enterprise Server 12 SP3: 4.4.162-94.72-default or later”. That “or later” lead us to the assumption, that it is certified. But it is not, I’ll explain why.
If you search MOS for Clusterware certifications, you walk through these steps:



Here is a direct link to that search.
And that is the point, where you have to read very carefully to interpret the result. The title saying “SP3+” might lead to the assumption that also later SPs are supported. But in fact they are not. The note states clearly, that only “SP3” is supported with some minimum kernel version to be used. I highlighted that in the picture above. So that means in turn, only SLES 12 SP3 is certified, nothing else.
If you are using or planning to use ACFS, you should also check the certification/compatibility for that. But there is no product for ACFS, there is just a note in MOS: 1369107.1.

In this note, the certification is stated a bit more cleary. It says “SP3” followed by “comma” and then the minimum required kernel version. So the same here, just SP3 of SLES12 is certified, nothing else. As far as I know, that SP3 is not under SuSE’s support anymore at the time of writing. Only SLES 15 is, but there is no upgrade path for this. If you choose to use SLES 15, you need to install it from scratch. And there are a couple of patches required to make the Clusterware run on SLES 15.
I hope this helps you understand and interpret the certifications a bit better next time and choose the propper OD version.

Change time zone in Oracle’s DBaaS

Recently I assisted in a project that is utilizing Oracle Cloud Infrastructure (OCI) to provide application servers and Oracle databases. The customer is located in New Zealand, so they obviously wanted to work with dates and times in their local time zone. But when it came to the Database Scheduler, we found, that is was using UTC rather than the local time zone. This makes totally sense in the first place, but it’s not what we wanted to have.
Here is, what it looked like initially.

[oracle@ecotes-dbt ~]$ date
Thu Nov 14 07:05:29 UTC 2019

SQL> select sysdate from dual;

SYSDATE
----------------
14.11.2019 07:06

So at first we checked and changed the time zone for the underlying server. This is done by pointing /etc/localtime to another file.

[opc@ecotes-dbt ~]# sudo su - 
[root@ecotes-dbt ~]# ll /etc/localtime
lrwxrwxrwx 1 root root 23 Oct  7 06:51 /etc/localtime -> /usr/share/zoneinfo/UTC
[root@ecotes-dbt ~]#  ll /usr/share/zoneinfo/NZ
-rw-r--r-- 4 root root 2434 Jan 29  2018 /usr/share/zoneinfo/NZ
[root@ecotes-dbt ~]# ll /usr/share/zoneinfo/Pacific/Auckland
-rw-r--r-- 4 root root 2434 Jan 29  2018 /usr/share/zoneinfo/Pacific/Auckland
[root@ecotes-dbt ~]# rm -f /etc/localtime
[root@ecotes-dbt ~]# ln -s /usr/share/zoneinfo/Pacific/Auckland /etc/localtime
[root@ecotes-dbt ~]# ll /etc/localtime
lrwxrwxrwx 1 root root 36 Nov 14 20:10 /etc/localtime -> /usr/share/zoneinfo/Pacific/Auckland

[oracle@ecotes-dbt ~]$ date
Thu Nov 14 20:11:31 NZDT 2019

Now the database is picking up the new settings from the OS.

SQL> select sysdate from dual;

SYSDATE
----------------
14.11.2019 20:11

Unfortunately, the database scheduler has it’s own settings. So we needed to change that too.

SQL> SELECT *
2    FROM   dba_scheduler_global_attribute
3    WHERE  attribute_name = 'DEFAULT_TIMEZONE';   

ATTRIBUTE_NAME         VALUE
---------------------- -----------
DEFAULT_TIMEZONE       Etc/UTC

SQL> select dbms_scheduler.stime from dual;

STIME
---------------------------------------------------------------------------
14-NOV-19 08.26.59.556872000 AM ETC/UTC

But we can simply change this default with a single call to DBMS_SCHEDULER.SET_SCHEDULER_ATTRIBUTE.

SQL> exec DBMS_SCHEDULER.SET_SCHEDULER_ATTRIBUTE('default_timezone', 'Pacific/Auckland');

PL/SQL procedure successfully completed.

SQL> SELECT *
2    FROM   dba_scheduler_global_attribute
3    WHERE  attribute_name = 'DEFAULT_TIMEZONE';   

ATTRIBUTE_NAME         VALUE
---------------------- -----------
DEFAULT_TIMEZONE       Pacific/Auckland

SQL> select dbms_scheduler.stime from dual;

STIME
---------------------------------------------------------------------------
14-NOV-19 09.31.26.365519000 PM PACIFIC/AUCKLAND

So that’s it. Now we have our DBaaS running with the time zone specific to New Zealand.


Update 2020-01-06

In the first version of my post, I did not mention the relevance of the server side time zone settings. Because a database session inherits its NLS settings from the listener that forks the process for a session. You can find a complete checklist in MOS Note 1627439.1.
Basically, you set the timezone in /etc/sysconfig/clock. The possible values can be found in /usr/share/zoneinfo.

$ cat /etc/sysconfig/clock
ZONE="Pacific/Auckland"
UTC=true
ARC=false

Before doing so, you should check if the times for UTC and your desired time zone are accurate.

$ export TZ=UTC
$ date
Fri Dec 20 11:53:48 UTC 2019
$ export TZ=Pacific/Auckland
$ date
Sat Dec 21 00:53:52 NZDT 2019

The settings for the Clusterware may also need a change. The neccessary steps can be found in part D) of the note mentioned above. Ideally you set the time zone only in $GRID_HOME/crs/install/s_crsconfig__env.txt:

$ grep ^TZ $GRID_HOME/crs/install/s_crsconfig_$(hostname -s)_env.txt
TZ=Pacific/Auckland

All other ressource specific settings shall be removed using “srvctl unsetenv [database | asm | listener] -envs TZ”, that will make those ressources use the settings defined for the clusterware.
If all those changes have been made, the Clusterware stack should be bounced to bring all the changes into effect.e geƤnderten Einstellungen zu aktivieren.

Oracle Grid Infrastructure vs. SLES 12

During a new installation of an Oracle Grid Infrastructure 19.4 at a customers site, I experienced a strange behaviour. The databases could be started and stopped via SQL*Plus as usual. Also stopping of databases using “srvctl” was fine. But when it came to starting a database via “srvctl” or rebooting the nodes, I experienced trouble. This looked like this:

oracle@proddb20:~> srvctl start database -db db3p
PRCR-1079 : Failed to start resource ora.db3p.db
CRS-5017: The resource action "ora.db3p.db start" encountered the following error:
ORA-00444: background process "PXMN" failed while starting
ORA-27300: OS system dependent operation:fork failed with status: 11
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/proddb20/crs/trace/crsd_oraagent_oracle.trc".

So I started to investigate and started with the alert.log of the database. It had some more details for me.

oracle@proddb20:~> tail /u01/app/oracle/diag/rdbms/db3p/db3p/trace/alert_db3p.log
2019-10-08T13:20:56.263194+02:00
Errors in file /u01/app/oracle/diag/rdbms/db3p/db3p/trace/db3p_psp0_187365.trc:
ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
ORA-27302: failure occurred at: skgpspawn3
2019-10-08T13:20:57.237250+02:00
Process LREG died, see its trace file
USER (ospid: ): terminating the instance due to ORA error
2019-10-08T13:20:58.269727+02:00
Instance terminated by USER, pid = 187344

The messages vary, the failing function mentioned in the ORA-27302 is not always the same. A look in the mentioned tracefile revealed the following information.

oracle@proddb20:~> cat /u01/app/oracle/diag/rdbms/db3p/db3p/trace/db3p_psp0_187365.trc
Trace file /u01/app/oracle/diag/rdbms/db3p/db3p/trace/db3p_psp0_187365.trc
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.4.0.0.0
Build label:    RDBMS_19.3.0.0.0DBRU_LINUX.X64_190417
ORACLE_HOME:    /u01/app/oracle/product/19.3.0/db_ee_1
System name:    Linux
Node name:      proddb20
Release:        4.4.166-3.g849dcaf-default
Version:        #1 SMP Fri Dec 7 15:18:32 UTC 2018 (849dcaf)
Machine:        x86_64
Instance name: db3p
Redo thread mounted by this instance: 0 

Oracle process number: 4
Unix process pid: 187365, image: oracle@proddb20 (PSP0)
  
*** 2019-10-08T13:20:56.240852+02:00
*** SESSION ID:(61.55718) 2019-10-08T13:20:56.240872+02:00
*** CLIENT ID:() 2019-10-08T13:20:56.240876+02:00
*** SERVICE NAME:() 2019-10-08T13:20:56.240879+02:00
*** MODULE NAME:() 2019-10-08T13:20:56.240882+02:00
*** ACTION NAME:() 2019-10-08T13:20:56.240885+02:00
*** CLIENT DRIVER:() 2019-10-08T13:20:56.240887+02:00
 
Process startup failed, error stack:
ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
ORA-27302: failure occurred at: skgpspawn3
OS - DIAGNOSTICS
----------------
loadavg : 0.27 0.12 0.11
Memory (Avail / Total) = 118346.89M / 128328.88M
Swap (Avail / Total) = 16386.00M /  16386.00M
Max user processes limits(s / h) =  65536 / 65536
----------------

My first guess was, that some kernel parameters were not set properly. But a quick check showed that everything was fine at that point.
That’s why I went to My Oracle Support were I quickly found this note: SLES 12: Database Startup Error with ORA-27300 ORA-27301 ORA-27303 While Starting using Srvctl (Doc ID 2340986.1). The note talks about a new functionality named “cgroup controller” introduced in SLES 12. This new functionality limits the maximum number of so-called tasks that a single process may start. The default limit for this is 512 tasks per process. For the Grid Infrastructure in it’s very basic setup right after the installation it looks like this.

proddb20:~ # systemctl status ohasd
ohasd.service - LSB: Start and Stop Oracle High Availability Service
   Loaded: loaded (/etc/init.d/ohasd; bad; vendor preset: disabled)
   Active: active (exited) since Fri 2019-10-04 14:30:19 CEST
     Docs: man:systemd-sysv-generator(8)
  Process: 4024 ExecStart=/etc/init.d/ohasd start (code=exited, tatus=0/SUCCESS)
    Tasks: 471 (limit: 512) 

As you can see, it is already near the limit. So if I now start a database instance, I will definitely reach that limit causing the instance startup to fail.
The limit can be increased by modifying/specifying the value for “DefaultTasksMax” in “/etc/systemd/system.conf”.

proddb20:~ # grep DefaultTasksMax /etc/systemd/system.conf 
#DefaultTasksMax=512
proddb20:~ # vi DefaultTasksMax /etc/systemd/system.conf 

proddb20:~ # grep DefaultTasksMax /etc/systemd/system.conf 
DefaultTasksMax=65535

After a reboot the new value is being picked up and the Grid Infrastructure can now start much more tasks. That means, the databases come up right away during the node startup and I am finally able to start the databases using “srvctl”.

proddb20:~ # systemctl status ohasd
ohasd.service - LSB: Start and Stop Oracle High Availability Service
   Loaded: loaded (/etc/init.d/ohasd; bad; vendor preset: disabled)
   Active: active (exited) since Tue 2019-10-08 14:30:19 CEST; 1min 12s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 4024 ExecStart=/etc/init.d/ohasd start (code=exited, tatus=0/SUCCESS)
    Tasks: 463 (limit: 65535)

So it is defnitely a good idea to set/increase that limit before you even start installing a Grid Infrastructure.

ODA – cleanup repository

A couple of days ago, Oracle released version 18.7 for the Oracle Database Appliance. The most useful new feature in my opinion is the “Repository Cleanup”. You can have a look at all the other new features in the Realase Notes.

For now, I’ll outline the usage of the cleanup feature. First, let’s have a look at the options:

[root@odax6m ~]# odacli cleanup-patchrepo -h
Usage: cleanup-patchrepo [options]
  Options:
    --clone, -cl
      Flag for clone directory cleanup.
    --component, -comp
      Components:{DB,GI} default{Both DB and GI}
    --help, -h
      get help
    --json, -j
      json output
    --local, -l
      Cleanup Repository for local node.
    --node, -n
      Cleanup Repository for specific nodes
    --version, -v
      ODA Patch Version.

It states, and the documentation does too, the options are all optional but in fact this isn’t true:

[root@odax6m pkgrepos]# odacli cleanup-patchrepo
DCS-10001:Internal error encountered: Invalid Input Parameter to delete the clones and patch component.

I started with the clone files. But I did not manage to delete a single DB clone. These are the clones I had:

[root@odax6m ~]# cd /opt/oracle/oak/pkgrepos/
[root@odax6m pkgrepos]# ll orapkgs/clones/
total 14615580
-r-xr-xr-x 1 root root 2395927731 Jun 17  2018 db112.180417.tar.gz
-r-xr-xr-x 1 root root 4199611255 Jun 17  2018 db121.180417.tar.gz
-rw-r--r-- 1 root root 3857973981 Jun 17  2018 db122.180417.tar.gz
-r-xr-xr-x 1 root root 4498196424 Dec 18  2018 db18.181016.tar.gz

I tried several different combinations:

[root@odax6m pkgrepos]# odacli cleanup-patchrepo -cl -v 18.3.0.0.0
[root@odax6m pkgrepos]# odacli cleanup-patchrepo -cl -v 18.5.0.0.0
[root@odax6m pkgrepos]# odacli cleanup-patchrepo -cl -comp db -v 18.3.0.0.0
[root@odax6m pkgrepos]# odacli cleanup-patchrepo -cl -v 18.3.0.0.181016
DCS-10200:Supplied version: 18.3.0.0 is invalid.

Except the last one, all cleanup jobs run successfully, but still there are all clones:

[root@odax6m pkgrepos]# ll orapkgs/clones/
total 14615580
-r-xr-xr-x 1 root root 2395927731 Jun 17  2018 db112.180417.tar.gz
-r-xr-xr-x 1 root root 4199611255 Jun 17  2018 db121.180417.tar.gz
-rw-r--r-- 1 root root 3857973981 Jun 17  2018 db122.180417.tar.gz
-r-xr-xr-x 1 root root 4498196424 Dec 18  2018 db18.181016.tar.gz

I’ll investigate this later.
Next, I cleaned the Bundlepatches. Since the ODA was freshly imaged with 18.7.0.0 there where just 18.7.0.0.0 and 18.6.0.0.0 patches installed.

[root@odax6m pkgrepos]# ll db/*
db/11.2.0.4.190416:
total 1200324
-r-xr-xr-x 1 root root        873 Sep 12 18:32 componentmetadata.xml
-r-xr-xr-x 1 root root   89475398 Sep 12 18:32 db112.patch.storage.zip
-r-xr-xr-x 1 root root    4953025 Sep 12 18:32 p28602216_112040_Linux-x86-64.zip
-r-xr-xr-x 1 root root 1133472011 Sep 12 18:32 p29255947_112040_Linux-x86-64.zip

db/11.2.0.4.190716:
total 1250008
-r-xr-xr-x 1 root root        949 Sep 12 18:31 componentmetadata.xml
-r-xr-xr-x 1 root root   89475398 Sep 12 18:31 db112.patch.storage.zip
-r-xr-xr-x 1 root root    4953025 Sep 12 18:31 p28602216_112040_Linux-x86-64.zip
-r-xr-xr-x 1 root root 1184297467 Sep 12 18:31 p29698727_112040_Linux-x86-64.zip

db/12.1.0.2.190416:
total 3285576
-r-xr-xr-x 1 root root       1015 Sep 12 18:32 componentmetadata.xml
-r-xr-xr-x 1 root root    3398176 Sep 12 18:32 p28602202_121020_Linux-x86-64.zip
-r-xr-xr-x 1 root root 3357733895 Sep 12 18:32 p29176139_121020_Linux-x86-64.zip

db/12.1.0.2.190716:
total 3635656
-r-xr-xr-x 1 root root       1091 Sep 12 18:30 componentmetadata.xml
-r-xr-xr-x 1 root root    3398176 Sep 12 18:30 p28602202_121020_Linux-x86-64.zip
-r-xr-xr-x 1 root root 3715867922 Sep 12 18:31 p29698629_121020_Linux-x86-64.zip

db/12.2.0.1.190416:
total 1701944
-r-xr-xr-x 1 root root        925 Sep 12 18:31 componentmetadata.xml
-r-xr-xr-x 1 root root    1547429 Sep 12 18:31 p28602183_122010_Linux-x86-64.zip
-r-xr-xr-x 1 root root 1736326653 Sep 12 18:32 p29301687_122010_Linux-x86-64.zip
-r-xr-xr-x 1 root root    3200431 Sep 12 18:31 p29344646_12201190416DBAPR2019RU_Linux-x86-64.zip

db/12.2.0.1.190716:
total 1934444
-r-xr-xr-x 1 root root       1001 Sep 12 18:30 componentmetadata.xml
-r-xr-xr-x 1 root root    1547429 Sep 12 18:30 p28602183_122010_Linux-x86-64.zip
-r-xr-xr-x 1 root root 1977375813 Sep 12 18:30 p29708720_122010_Linux-x86-64.zip

db/18.6.0.0.190416:
total 3364852
-r-xr-xr-x 1 root root       1010 Sep 12 18:31 componentmetadata.xml
-r-xr-xr-x 1 root root 1024357189 Sep 12 18:31 p29173957_186000ACFSRU_Linux-x86-64.zip
-r-xr-xr-x 1 root root  114529524 Sep 12 18:31 p29249584_180000_Linux-x86-64.zip
-r--r--r-- 1 root root 2303327208 Sep 12 18:31 p29764249_180000_Linux-x86-64.zip

db/18.7.0.0.190716:
total 2489356
-r-xr-xr-x 1 root root       1038 Sep 12 18:30 componentmetadata.xml
-r-xr-xr-x 1 root root 2546599237 Sep 12 18:30 p30097923_180000_Linux-x86-64.zip

The only option is to remove the Bundlepatches that came with 18.7.0.0.0.

[root@odax6m pkgrepos]# odacli cleanup-patchrepo -comp GI,DB -v 18.6.0.0.0

This time it removed the patches:

[root@odax6m pkgrepos]# ll db/*
db/11.2.0.4.190716:
total 1250008
-r-xr-xr-x 1 root root        949 Sep 12 18:31 componentmetadata.xml
-r-xr-xr-x 1 root root   89475398 Sep 12 18:31 db112.patch.storage.zip
-r-xr-xr-x 1 root root    4953025 Sep 12 18:31 p28602216_112040_Linux-x86-64.zip
-r-xr-xr-x 1 root root 1184297467 Sep 12 18:31 p29698727_112040_Linux-x86-64.zip

db/12.1.0.2.190716:
total 3635656
-r-xr-xr-x 1 root root       1091 Sep 12 18:30 componentmetadata.xml
-r-xr-xr-x 1 root root    3398176 Sep 12 18:30 p28602202_121020_Linux-x86-64.zip
-r-xr-xr-x 1 root root 3715867922 Sep 12 18:31 p29698629_121020_Linux-x86-64.zip

db/12.2.0.1.190716:
total 1934444
-r-xr-xr-x 1 root root       1001 Sep 12 18:30 componentmetadata.xml
-r-xr-xr-x 1 root root    1547429 Sep 12 18:30 p28602183_122010_Linux-x86-64.zip
-r-xr-xr-x 1 root root 1977375813 Sep 12 18:30 p29708720_122010_Linux-x86-64.zip

db/18.7.0.0.190716:
total 2489356
-r-xr-xr-x 1 root root       1038 Sep 12 18:30 componentmetadata.xml
-r-xr-xr-x 1 root root 2546599237 Sep 12 18:30 p30097923_180000_Linux-x86-64.zip

Finally, we have an option to clean the repository and provide additional space for new patch files. Hopefully your remaining space allows to patch to 18.7.

Update 2019-10-01

I just learned, that the cleanup of RDBMS clones only removes older clone files of a specific relaease. It keeps just the most current clone per RDBMS relaase. In other words, there is no way to get rid of a complete release.

[root@odax6m ~]# ll /opt/oracle/oak/pkgrepos/orapkgs/clones/
total 16961664
-r-xr-xr-x 1 root root 2395927731 Jun 17  2018 db112.180417.tar.gz
-rwxr-xr-x 1 root root 2400037923 Sep  7  2018 db112.180717.tar.gz
-r-xr-xr-x 1 root root 4199611255 Jun 17  2018 db121.180417.tar.gz
-rw-r--r-- 1 root root 3857973981 Jun 17  2018 db122.180417.tar.gz
-r-xr-xr-x 1 root root 4498196424 Dec 18  2018 db18.181016.tar.gz
[root@odax6m ~]# odacli cleanup-patchrepo -cl
{
  "jobId" : "c02181b9-d8f5-40df-90de-82df4ea82f8d",
  "status" : "Created",
  "message" : null,
  "reports" : [ ],
  "createTimestamp" : "October 01, 2019 09:03:35 AM CEST",
  "resourceList" : [ ],
  "description" : "Cleanup patchrepos",
  "updatedTime" : "October 01, 2019 09:03:35 AM CEST"
}
[root@odax6m ~]# odacli describe-job -i "c02181b9-d8f5-40df-90de-82df4ea82f8d"

Job details
----------------------------------------------------------------
                     ID:  c02181b9-d8f5-40df-90de-82df4ea82f8d
            Description:  Cleanup patchrepos
                 Status:  Success
                Created:  October 1, 2019 9:03:35 AM CEST
                Message:

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Cleanup Repository                       October 1, 2019 9:03:35 AM CEST     October 1, 2019 9:03:35 AM CEST     Success

[root@odax6m ~]# ll /opt/oracle/oak/pkgrepos/orapkgs/clones/
total 14619596
-rwxr-xr-x 1 root root 2400037923 Sep  7  2018 db112.180717.tar.gz
-r-xr-xr-x 1 root root 4199611255 Jun 17  2018 db121.180417.tar.gz
-rw-r--r-- 1 root root 3857973981 Jun 17  2018 db122.180417.tar.gz
-r-xr-xr-x 1 root root 4498196424 Dec 18  2018 db18.181016.tar.gz

As you can see, only the older clone file of the 11.2 RDBMS release was removed.

Oracle Grid Infrastructure 19c Installation

Lately I tried to install Oracle Grid Infrastructure on a SUSE SLES12 SP3 server. Don’t blame me for this OS, it was not my choice. The setup of the cluster went quite smooth, just some components did not start properly after the setup. We needed to re-issue the start or stop commands in order to make things happen. But this effect was gone one we applied the current (July 2019) Release Update. Unfortunately we saw some other nasty error messages in the crs alert.log:

CRS-8503 [__lll_unlock_elision()+48] [Signal/Exception: 11] [Instruction Addr: 0x7f1ec2deb4a0] [Memory Addr: (nil)] [] [] [] [] [] [] [] []

A quick reseach on MOS revealed note Installation of Oracle Clusterware 12c(12cR1 & 12cR2) on SLES 12 fails with CRS-8503 [Signal / Exception: 11] (Doc ID 2270947.1) with a solution, deconfigure the cluster, change some linking configuration and re-configure the cluster. I did not like the idea of re-creating the cluster, so I simply followed the steps to relink the binaries and in between, did the modifications mentioned in the MOS note.

root # export ORACLE_HOME=/u01/app/grid/19
root # $ORACLE_HOME/crs/install/rootcrs.sh -unlock
root # vi  /etc/ld.so.conf

  ### Add the this at the very top
  /lib64/noelision

root # ln -s /lib64/noelision/libpthread-2.19.so $ORACLE_HOEM/lib/libpthread.so.0

root # su - oracle
oracle $> export ORACLE_HOME=/u01/app/grid/19
oracle $> $ORACLE_HOME/bin/relink
oracle $> exit
root # $ORACLE_HOME/crs/install/rootcrs.sh -lock

Having done this, the error messages were gone.