Parallelizing Standard Edition Data Pump

Todays blog post is about Data Pump. Sometimes we use this tool to migrate applications to another database. There are some reasons for doing so, reorganizing data, difficult upgade/migration paths to mention two. With the Enterpise Edition of the Oracle Database we can do the export and import in parallel. But what if we are using Standard Edition only? We are bound to export and import serially. This is a major restriction, especially if time is limited. But there is a way to make it run in parallel. The basic idea is to split the export/import process into several tasks which then can run in parallel. There is no restriction in having several Data Pump processes at a time. It makes the process more difficult, but it can speed up the whole process a lot. Let’s see how it is done. In my example I use direct import using a network link, this skips the step of persisting the transported data in the filesystem and makes the process even faster.
Basically there are three steps that cost a lot of time:

  1. Transport table data
  2. Create indexes
  3. Create/validate constraints

To parallelize the transport of table data, I first analyze the size of the tables. Typically there are a handful large tables and many many small tables. I create sets of tables that should be transported together, these sets are made of the large tables. Then there is another import, that imports all the other tables. For the latter one I need to exclude indexes and constraints since those will be created in parallel afterwards. For best reproducability I use parameter files for all import runs. Beside that, it makes handling of quotes much easier than directly at the prompt. And I use “job_name” to better identify and monitor the data pump jobs.
First, the parameters for the sets of large tables, in my example there is just one table per set:

userid=
network_link=
logfile=DATA_PUMP_DIR:import_table1.log
include=TABLE:"=<TABLE>"
job_name=table_1
userid=
network_link=
logfile=DATA_PUMP_DIR:import_table2.log
include=TABLE:"=<TABLE>"
job_name=table_2

Repeat this for all the other table sets. Now the parameter file for the rest of the data and all the other information, I use a schema based import and exclude the large tables from above.

userid=
network_link=
logfile=DATA_PUMP_DIR:import_therest.log
schemas=
exclude=TABLE:"in ('TABLE_1', 'TABLE_2')"
exclude=INDEX
exclude=CONSTRAINT
job_name=the_rest

Since the last job creates all required users, I start this job first and wait until the user creation is finished. After that, the jobs for the large tables can be started.
When all of those jobs are finished, it is time to transport the indexes. To create all indexes, I use database jobs with no repeat interval. This makes the jobs go away once they finish successfully. The degree of parallelizm can be easily adapted by setting the “job_queue_processes” parameter. The job action can be created using DBMS_METADATA. The following script will generate a SQL script that creates one database job per index creation.

set serveroutput on
begin
  dbms_output.put_line('var x number');
  dbms_metadata.set_transform_param(
    TRANSFORM_HANDLE=>DBMS_METADATA.SESSION_TRANSFORM,
    name=>'STORAGE', 
    value=>FALSE
  );
  for obj in (
    select 'execute immediate ''''' || dbms_metadata.get_ddl('INDEX', index_name, owner) || ''''';' ddl
    from dba_indexes where owner=''
  ) loop
    dbms_output.put_line('begin dbms_job.submit(:x, what=>''' || obj.ddl || ''', next_date=>sysdate, interval=>null); end;');
  dbms_output.put_line('/');
  dbms_output.put_line('');
  end loop;
  dbms_output.put_line('commit;');
end;
/

After that, the constraints can be created in a similar way.

set serveroutput on
begin
  dbms_output.put_line('var x number');
  DBMS_METADATA.set_transform_param (DBMS_METADATA.session_transform, 'SQLTERMINATOR', true);
  for obj in (
    select dbms_metadata.get_ddl('CONSTRAINT', constraint_name, owner) ddl
    from dba_constraints where owner=''
    and constraint_type IN ('U', 'P')
  ) loop
    dbms_output.put_line('exec dbms_job.submit(:x, what=>''' || obj.ddl || ''', next_date=>sysdate, interval=>null);');
    dbms_output.put_line('/');
    dbms_output.put_line('');
  end loop;
  dbms_output.put_line('commit;');
end;
/

This craetes all the unique and primary constraints. If you wish to include more, simply adapt the script. Be aware of the fact, that I did not include all constraint types. I need to do a last metadata import at last to make sure all objects are copied to the new environment.

The approach can also be used to speed up Data Pump for Enterprise Edition since this only transports tables in parallel and does the index creation etc. serially too, that depends on the version you use.

Hope this helps.

Advertisements

Oracle SE2 and Instance Caging

At DOAG Exaday 2018 Martin Bach talked about using the Resource Manager. He explained, that Resource Manager is using the CPU_COUNT initialization parameter to limit the CPU utilization of a database instance. This is called instance caging and requires an Enterprise Edition License. Resource Manager is not available in Standard Edtion 2 (SE2) according to the Licensing Guide:

se2_resmgr_cpucount_licguide

On the other hand, SE2 is limited to 16 threads per database. Franck Pachot did investigate this in his blog post and found out, that internally Resource Manager features are being used to enforce this limitation.

So the question came up, what will happen to a SE2 Oracle Database that has CPU_COUNT set. According to the documentation, CPU_COUNT works only with Resource Manager which is not part of SE2, but SE2 is using Resource Manager internally.

Now, let’s try. I did use the same method to generate load that Franck Pachot used for his tests. A simple PL/SQL block running in several database jobs.  For testing this, I set CPU_COUNT to 4 and did run 10 parallel jobs. Having this, my workload is definitely below the internal maximum of 16 threads. To measure the workload I used top, oratop and Statspack, database version was 12.2.0.1.171017.

And these are the results. My server (ODA X7-2S, 1 CPU, 10 cores hyperthreaded) had an utilization of roughly 20% plus a little overhead. Having 20 cores in the OS and a CPU_COUNT of 4, I end up with a maximum of 1/5th of the server that I can utilize. In other words: 20%. This is what “top” showed:

se2_resmgr_cpucount_jobs_top

To see, what the database instance is actually doing, I used “oratop”:

se2_resmgr_cpucount_jobs_oratop

You see, there are 4 session on CPU and some others waiting for Resource Manager. That proofes, that SE2 internal Resource Manager is using the CPU_COUNT parameter to limit the utilization.  Finally, let’s check the Statspack report:

sp_2_3_jobs

In this overview you can see the 10 sessions that are trying to do something (DB time) and the 4 sessions that are actually doing some work on CPU (DB CPU).

Conclusion / Disclaimer

Instance caging does work in a SE2 Oracle Database. You can limit the database to use even less than 16 threads. But the fact that this works does not necessarily mean that it is allowed. So use in case you use this featuer, you do that on your own risk.

Edit: Dominic Giles stated in the Twitter discussion that it is allowed to do instance caging in SE2.

se2_resmgr_cpucount_giles

Transportable Tablespaces, Deferred Segment Creation and SE

Right now I am in the middle of a project which is about moving an old 11.2.0.2 Oracle Enterprise Edition Database of roughly 1.5TB from the US to new hardware in Germany including an upgrade to 12.2.0.1 and a switch the Standard Edition 2. As you see, there are a couple of things to do and I am more than happy to do this challenging project. The basic plan is this:

  1. Create a new 12.2.0.1 database in Standard Edition 2
  2. Get an initial copy of the source database to Germany
  3. Restore that copy on the new hardware with an 11.2.0.2 Oracle Home
  4. Get the archivelogs on a regular basis from the source database in the US
  5. Recover the german copy of the database
  6. Repeat 4 and 5 until cut-over
  7. Open Resetlogs the german copy of the database
  8. Move the data to the new 12.2.0.1 database using transportable tablespace

According to the Licensing Guide it is allowed to plug in transportable tablespaces into a SE2 database. So I am completely happy with my approach.

But during the test phase of this migration I encountered a nasty error when plugging in the tablespace. I obfuscated some identifiers, so don’t be too strict:

Import: Release 12.2.0.1.0 - Production on Thu May 31 15:32:01 2018

Copyright (c) 1982, 2017, Oracle and/or its affiliates.  All rights reserved.
;;; 
Connected to: Oracle Database 12c Standard Edition Release 12.2.0.1.0 - 64bit Production
Master table "SYSTEM"."SYS_IMPORT_TRANSPORTABLE_01" successfully loaded/unloaded
Starting "SYSTEM"."SYS_IMPORT_TRANSPORTABLE_01":  system/******** directory=copy_dump logfile=some_ts.impdp.log dumpfile=some_ts.dmpdp transport_datafiles=/u02/app/oracle/oradata/SOMEDB/SOMEDB/datafile/e*dbf 
Processing object type TRANSPORTABLE_EXPORT/PLUGTS_BLK
Processing object type TRANSPORTABLE_EXPORT/TABLE
ORA-39083: Object type TABLE:"SOME_USER"."SOME_TABLE" failed to create with error:
ORA-01647: tablespace 'SOME_TS' is read-only, cannot allocate space in it

Failing sql is:
CREATE TABLE "SOME_USER"."SOME_TABLE" (<Column list stripped>) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING TABLESPACE "SOME_TS" 

Ok, the tablesapce is read only. That is supposed to be. A quick research at MOS revealed a note “ORA-39083 & ORA-01647 during TTS or EXPDP/IMPDP from Enterprise Edition to Standard Edition when deferred_segment_creation=True (Doc ID 2288535.1)” which states, that the deferred segment creation is the root cause. That makes sense somehow. The note further states, the workaround is either to use the version parameter of “impdp” during import TTS or to disable deferred segment creation at the source database before starting the TTS.
To cut a long story short, both workarounds do not work and end up with the exact same error. In my opinion this makes totally sense, because the segment for the table in question is simply not present in the transported tablespace. And it can’t be created during the TTS import since the tablespace is marked read only. It cannot be created when I use the version parameter for impdp nor it is being created when I disable deferred segment creation.
The only feasible solution is to create the segment at the source database before setting the tablespace to read only. A simple

SQL> alter table SOME_USER.SOME_TABLE allocate extent;

Table altered.

does the trick. And that’s it. After creating a segment for this table the TTS import went fine without any further errors.
To make sure all objects do have a segment created, I use these queries:

SQL>  select 'alter table ' || owner || '.' || table_name || ' allocate extent;'
  2   from dba_tables where SEGMENT_CREATED='NO'
  3*  and owner='&USERNAME.';

So when using Transportable Tablespaces to move to a SE2 database and the source is 11.2.0.2 (the version when deferred segment creating was introduced) or higher, you better check that in advance. Hope that helps.