Parallelizing Standard Edition Data Pump

Todays blog post is about Data Pump. Sometimes we use this tool to migrate applications to another database. There are some reasons for doing so, reorganizing data, difficult upgade/migration paths to mention two. With the Enterpise Edition of the Oracle Database we can do the export and import in parallel. But what if we are using Standard Edition only? We are bound to export and import serially. This is a major restriction, especially if time is limited. But there is a way to make it run in parallel. The basic idea is to split the export/import process into several tasks which then can run in parallel. There is no restriction in having several Data Pump processes at a time. It makes the process more difficult, but it can speed up the whole process a lot. Let’s see how it is done. In my example I use direct import using a network link, this skips the step of persisting the transported data in the filesystem and makes the process even faster.
Basically there are three steps that cost a lot of time:

  1. Transport table data
  2. Create indexes
  3. Create/validate constraints

To parallelize the transport of table data, I first analyze the size of the tables. Typically there are a handful large tables and many many small tables. I create sets of tables that should be transported together, these sets are made of the large tables. Then there is another import, that imports all the other tables. For the latter one I need to exclude indexes and constraints since those will be created in parallel afterwards. For best reproducability I use parameter files for all import runs. Beside that, it makes handling of quotes much easier than directly at the prompt. And I use “job_name” to better identify and monitor the data pump jobs.
First, the parameters for the sets of large tables, in my example there is just one table per set:

userid=
network_link=
logfile=DATA_PUMP_DIR:import_table1.log
include=TABLE:"=<TABLE>"
job_name=table_1
userid=
network_link=
logfile=DATA_PUMP_DIR:import_table2.log
include=TABLE:"=<TABLE>"
job_name=table_2

Repeat this for all the other table sets. Now the parameter file for the rest of the data and all the other information, I use a schema based import and exclude the large tables from above.

userid=
network_link=
logfile=DATA_PUMP_DIR:import_therest.log
schemas=
exclude=TABLE:"in ('TABLE_1', 'TABLE_2')"
exclude=INDEX
exclude=CONSTRAINT
job_name=the_rest

Since the last job creates all required users, I start this job first and wait until the user creation is finished. After that, the jobs for the large tables can be started.
When all of those jobs are finished, it is time to transport the indexes. To create all indexes, I use database jobs with no repeat interval. This makes the jobs go away once they finish successfully. The degree of parallelizm can be easily adapted by setting the “job_queue_processes” parameter. The job action can be created using DBMS_METADATA. The following script will generate a SQL script that creates one database job per index creation.

set serveroutput on
begin
  dbms_output.put_line('var x number');
  dbms_metadata.set_transform_param(
    TRANSFORM_HANDLE=>DBMS_METADATA.SESSION_TRANSFORM,
    name=>'STORAGE', 
    value=>FALSE
  );
  for obj in (
    select 'execute immediate ''''' || dbms_metadata.get_ddl('INDEX', index_name, owner) || ''''';' ddl
    from dba_indexes where owner=''
  ) loop
    dbms_output.put_line('begin dbms_job.submit(:x, what=>''' || obj.ddl || ''', next_date=>sysdate, interval=>null); end;');
  dbms_output.put_line('/');
  dbms_output.put_line('');
  end loop;
  dbms_output.put_line('commit;');
end;
/

After that, the constraints can be created in a similar way.

set serveroutput on
begin
  dbms_output.put_line('var x number');
  DBMS_METADATA.set_transform_param (DBMS_METADATA.session_transform, 'SQLTERMINATOR', true);
  for obj in (
    select dbms_metadata.get_ddl('CONSTRAINT', constraint_name, owner) ddl
    from dba_constraints where owner=''
    and constraint_type IN ('U', 'P')
  ) loop
    dbms_output.put_line('exec dbms_job.submit(:x, what=>''' || obj.ddl || ''', next_date=>sysdate, interval=>null);');
    dbms_output.put_line('/');
    dbms_output.put_line('');
  end loop;
  dbms_output.put_line('commit;');
end;
/

This craetes all the unique and primary constraints. If you wish to include more, simply adapt the script. Be aware of the fact, that I did not include all constraint types. I need to do a last metadata import at last to make sure all objects are copied to the new environment.

The approach can also be used to speed up Data Pump for Enterprise Edition since this only transports tables in parallel and does the index creation etc. serially too, that depends on the version you use.

Hope this helps.

Advertisements