Oracle Backfill Performance

Author: Dallas Willett | 7 min read | May 5, 2011

Most of us have had the need to backfill some data at one point or another. Perhaps we needed to create some data to test out a new application we’re building, or we wanted to test a new Oracle feature. Backfilling can be as easy as writing some INSERT and UPDATE commands, or coding up a simple PL/SQL script. It probably doesn’t matter how efficient your SQL is when you’re dealing with 10’s of rows or even 10’s of thousands of rows, but what happens when you start dealing with millions of rows? At Datavail we follow three main principles when backfilling:

Design to Handle Failure
Operate on Manageable Chunks
Use Bulk Operations

Design to Handle Failure

When dealing with large volumes of data, backfills can consume a fair amount of resources and can often take an extended period of time to execute. The last thing we want to happen is for the backfill to be 99% complete and then run into an error that causes the whole operation to be rolled back. Instead, we should design our backfill in such a way that we expect it to fail before it completes. In doing so, we can then just start the backfill up again and it should resume where it left off.

Operate on Manageable Chunks

Operating on manageable chunks of data can help us in our quest for failure tolerant backfills. However, it’s not all about resuming the backfill after a failure. Ok, so we’re not going to wait until the very end and issue one commit. On the other hand, we don’t want to issue a commit for every row we are backfilling. That would slow us down and potentially impact other users on our database. So what is a “manageable chunk” then? We usually pick a starting point of around one thousand rows and go up from there as needed or as testing allows. That’s three orders of magnitude up from backfilling one row at a time, which will buy us some nice performance increases.

Use Bulk Operations

Using bulk operations is another principle that works hand-in-hand with the other two. Oracle has some useful features in PL/SQL that let you operate on things one row at a time and then save your work in bulk. Oracle also gives us the ability to SELECT or FETCH rows in bulk as well. This reduces the number of executions and round-trips between our backfill code and the database.

Example INSERTS

Lets work through an example where we want to INSERT a million rows into a table. We’ll see how long each backfill takes and examine the 10046 trace output to determine where our time is being spent. Lets use the following table definition:

  CREATE TABLE test_table (id NUMBER, value NUMBER);

Our first attempt might look something like the following, where we’re doing a simple loop, inserting and committing every row.

DECLARE
  v_id NUMBER;
BEGIN
  FOR v_id IN 1..1000000 LOOP
    INSERT INTO test_table VALUES (v_id, 1);
    COMMIT;
  END LOOP;
END;
/

Elapsed: 00:01:59.77

Incorporating our principles of Designing to Handle Failure and Operating on Manageable Chunks, we might modify our PL/SQL script to the following:

DECLARE
  v_start_id NUMBER;
  v_id NUMBER;
BEGIN
  SELECT NVL(max(id),0)+1 INTO v_start_id FROM test_table;
  FOR v_id IN v_start_id..1000000 LOOP
    INSERT INTO test_table VALUES (v_id, 1);
    IF MOD(v_id,1000)=0 THEN
      COMMIT;
    END IF;
  END LOOP;
  COMMIT;
END;
/

Elapsed: 00:01:16.74

If we further enhance this by Using Bulk Operations, we get the following:

DECLARE
  v_id NUMBER;
  v_start_id NUMBER;
  TYPE t_record_table IS TABLE OF test_table%ROWTYPE INDEX BY BINARY_INTEGER;
  v_new_record_table t_record_table;
  v_new_record test_table%ROWTYPE;
  PROCEDURE flush IS
  BEGIN
    FORALL i IN 1..v_new_record_table.COUNT
      INSERT INTO test_table VALUES (
        v_new_record_table(i).id,
        v_new_record_table(i).value);
      v_new_record_table.DELETE;
      COMMIT;
  END;
BEGIN
  SELECT NVL(max(id),0)+1 INTO v_start_id FROM test_table;
  FOR v_id IN v_start_id..1000000 LOOP
    v_new_record := NULL;
    v_new_record.id := v_id;
    v_new_record.value := 1;
    v_new_record_table(v_new_record_table.count+1) := v_new_record;
    IF MOD(v_id,1000)=0 THEN
      flush;
    END IF;
  END LOOP;
  flush;
END;
/

Elapsed: 00:00:02.76

Looking at the tkprof output from the 10046 traces on each PL/SQL block above, we see the following results for our 1 million row inserts. We should note that the act of tracing the backfill slowed down the backfill in many cases, so I used the Elapsed time from multiple executions that were not traced and determined the percentages from the trace files.

                                Elapsed PL/SQL  Insert  Commit
  Backfill                      Time    Time    Time    Time
  Loop, commit every row        120 s   36%     51%     13%
  Loop, commit every 1000 rows   77 s   23%     77%     0%
  Bulk, commit every 1000 rows  2.6 s   46%     50%     4%

Example UPDATES

Lets take the same 1 million rows that we just inserted and run an UPDATE backfill on them. Again, we’ll wee how long each backfill takes and examine the 10046 trace output to see where our time is being spent. Our starting backfill will update all rows in one transaction.

UPDATE test_table SET value = 2 WHERE value = 1;
COMMIT;

Elapsed: 00:01:34.27

We can refine this a bit by issuing a commit every 1000 rows. We can see that this takes a little longer, but we’ve Designed to Handle Failure with this new PL/SQL and we’ve started to Operate on Manageable Chunks.

DECLARE
  CURSOR c1 IS
    SELECT rowid FROM test_table WHERE value = 1;
BEGIN
  FOR c1_rec IN c1 LOOP
    UPDATE test_table SET value = 2 WHERE rowid = c1_rec.rowid;
    IF MOD(c1%ROWCOUNT,1000)=0 THEN
      COMMIT;
    END IF;
  END LOOP;
  COMMIT;
END;
/

Elapsed: 00:02:06.63

In this next backfill, we’ve modified the script to use Use Bulk Operations for both our SELECT/FETCH and our UPDATE statements.

DECLARE
  CURSOR c1 IS
    SELECT rowid, 2 FROM test_table WHERE value = 1;
  TYPE t_upd_record IS RECORD (tab_rowid ROWID, value test_table.value%TYPE);
  TYPE t_record_table IS TABLE OF t_upd_record INDEX BY BINARY_INTEGER;
  v_upd_record_table t_record_table;
  v_upd_record t_upd_record;
  PROCEDURE flush IS
  BEGIN
    FORALL i IN 1..v_upd_record_table.COUNT
      UPDATE test_table
      SET value = v_upd_record_table(i).value
      WHERE rowid = v_upd_record_table(i).tab_rowid;
      v_upd_record_table.DELETE;
      COMMIT;
  END;
BEGIN
  OPEN c1;
  LOOP
    FETCH c1 BULK COLLECT INTO v_upd_record_table LIMIT 1000;
    flush;
    EXIT WHEN c1%NOTFOUND;
  END LOOP;
END;
/

Elapsed: 00:01:59.38

Well, that wasn’t much faster than the previous example that looped through and updated one row at a time. However, if we change the size of the chunks we’re working on from 1000 to 10000, we see an improvement. The elapsed time with “LIMIT 10000” was: Elapsed: 00:01:36.99, which is fairly close to our original, plain SQL UPDATE statement. For comparison, I also wanted to trace a Create Table As Select (CTAS). One should note that it is not always feasible to recreate the table instead of updating rows. This requires a downtime where no applications are modifying data and that may not be possible on a production system. Not only is CTAS intrusive, but I’m also not taking into account the time required to recreate indexes and constraints on the new table. However, we see that it can speed up the process quite a bit.

CREATE TABLE test_table2
 AS SELECT id, 2 AS value FROM test_table;

Elapsed: 00:00:00.45

Here are the results from the 10046 traces on the update backfills. Again, we note that the act of tracing the backfill slowed down the backfill in many cases, so I used the Elapsed time from multiple executions that were not traced and determined the percentages from the trace files.

                                Elapsed  PL/SQL  Select  Update  Commit  DDL
  Backfill                      Time     Time    Time    Time    Time    Time
  UPDATE                         94 s    0%      0%      100%    0%      0%
  Loop, commit every 1000 rows  126 s    10%     49%     41%     0%      0
  Bulk, commit every 1000 rows  119 s    1%      11%     88%     0%      0
  Bulk, commit every 10000 rows  97 s    1%      2%      97%     0%      0
  CTAS                          0.5 s    0%      0%      0%      0%      100%

Blog Author

Dallas Willett