«

Feb
12

Fastest way of deleting all but 1 row in a table?

I came across a situation at work today that required me to delete most of the rows in a table. The table wasn’t very large, about 200 000 rows with an average row size of 90-100. Plus indexes.

Let’s say it looked something like this:

CREATE TABLE order_lines(
  order_no      NUMBER(11) NOT NULL
 ,order_line_no NUMBER(5)  NOT NULL
 ,create_date   DATE       NOT NULL
 ,dispatch_date DATE
 ,arrive_date   DATE
 ,more_columns1 varchar2(20) DEFAULT lpad('x', 20, 'x')
 ,more_columns2 varchar2(20) DEFAULT lpad('x', 20, 'x')
 ,more_columns3 varchar2(20) DEFAULT lpad('x', 20, 'x')
 ,more_columns4 varchar2(20) DEFAULT lpad('x', 20, 'x')
 ,PRIMARY KEY(order_no, order_line_no)
 );

So I had just screwed up this test run in development and my script wrote bogus values all over the table. I needed to start over, but first I had some clean up to do. The first 1,000 orders were still intact, so I decided to leave them be, and delete everything I created after that.

DELETE
FROM order_lines
WHERE order_no > 1000;
 
199000 ROWS deleted.
Elapsed: 00:01:04.06

It took surprisingly long time (1 min 4 sec). While I was waiting for the delete to complete, I thought to myself: Too bad I couldn’t just truncate the table. Truncate table is instant…but I have to keep the old rows so…

The delete statement generates a lot of redo information (needed for recoverability), and is potentially the slowest of all DML statements. We pay a price for the ability to get a backup and re-apply all changes made since that point in time to bring the database up to date.

Then I realized there is a way to get around this. This is a batched environment. I’m the only one writing this table. I don’t need no stinking backup!
Instead of thinking of this as deleting the orders X, we can look at it as keeping the orders NOT(x). It’s as close to relational assignment as you get in Oracle :)

If I save a copy of the 1000 rows in another table, I could truncate the order_lines table and re-insert them.

Let’s try!

ROLLBACK;
 
SQL> CREATE TABLE old_rows nologging AS
  2     SELECT *
  3       FROM order_lines
  4      WHERE order_no <= 1000;
 
TABLE created.
Elapsed: 00:00:00.03
 
SQL> TRUNCATE TABLE order_lines;
 
TABLE truncated.
Elapsed: 00:00:00.11
 
SQL> INSERT INTO order_lines
  2  SELECT *
  3    FROM old_rows;
 
1000 ROWS created.
Elapsed: 00:00:00.06
 
SQL> commit;
 
Commit complete.
Elapsed: 00:00:00.00
 
SQL> DROP TABLE old_rows;
 
TABLE dropped.
Elapsed: 00:00:00.00

Yay, sub-second!
More lines of code, but we are actually doing things much more efficiently. There is of course a breaking point (% deleted rows) when it pays of to perform the delete instead.

The first statement creates a table with all order_lines with an order_nr less than 1000. This operation will result in a index range scan on the primary key.
Next, the truncate table statement is basically instant.
And finally, we re-insert the 1000 rows in the order_lines table.

Note: Obviously, this is not even an option in an multi user environment, but I think the example illustrates quite clearly, that sometimes we pay a high price for something that we don’t need.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">