»

Jan
15

Constraints and query performance

Today I want to show you an example of when constraints affect query performance. I don’t want to ruin the fun right at the beginning, so I will introduce you to the scenario first, and then describe what is happening. Have a look at the following code.

CREATE TABLE t1 AS
   SELECT rownum AS id
         ,lpad('x', 500, 'x') AS lots_of_other_columns
     FROM dual CONNECT BY level <= 20000;
 
CREATE TABLE t2 AS
   SELECT rownum AS id
         ,lpad('x', 500, 'x') AS lots_of_other_columns
     FROM dual CONNECT BY level <= 20000;
 
CREATE UNIQUE INDEX t1_uk ON t1(id);
CREATE UNIQUE INDEX t2_uk ON t2(id);
 
DELETE FROM t2 WHERE id = 666;
 
commit;
 
EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'t1');
EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'t2');

I created two identical tables containing 20,000 rows each.The ID column will contain the numbers 1 through 20000 (with nr 666 missing from T2). I also added one big varchar column to bump the row length to something more realistic. This is our starting point.

Here is the query we will be looking at. The underlying requirement was to returns all T1 rows where the ID can’t be found in T2. That’s a basic example of an anti-join.

SELECT id
  FROM t1
 WHERE id NOT IN(SELECT id FROM t2);
SQL> select id from t1 where id not in(select id from t2);

        ID
----------
       666

Elapsed: 00:00:30.00

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
   14388482  consistent gets
          0  physical reads
          0  redo size
        406  bytes sent via SQL*Net to client
        380  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

It took 30 seconds to find 1 row in a table of 20,000! Also look at the consistent gets! How can this be possible? A clue to the problem can be found in the execution plan. Have a look at operation 1 & 3 and the corresponding section in Predicate Information.

----------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 19999 | 99995 |  3921K  (1)|
|*  1 |  FILTER            |      |       |       |            |
|   2 |   TABLE ACCESS FULL| T1   | 20000 |    97K|   392   (1)|
|*  3 |   TABLE ACCESS FULL| T2   |     1 |     5 |   392   (1)|
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "T2" "T2" WHERE
              LNNVL("ID"<>:B1)))
   3 - filter(LNNVL("ID"<>:B1))

Basically, Oracle is performing a full table scan of table T1, and for each row in T1, another full table scan of T2 is performed, resulting in a whopping 20,000 FTS of T2. This is why the query is so slow. The question is why is this happening? Part of the answer can be found in the definition of the tables. It turns out that the ID columns are nullable. The second part of the answer lies in that the optimizer cannot select an anti-join for NOT IN() when either side is nullable.

We know for a fact that none of the rows contain nulls, and the statistics also tells the same story (num_nulls = 0), so Oracle “knows” it too. But how do we fix it?

We add the not null constraints to the table.

ALTER TABLE t1 MODIFY id NOT NULL;
ALTER TABLE t2 MODIFY id NOT NULL;

Rerunning the same query now gives the following plan/statistics:

Elapsed: 00:00:00.15
--------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)|
--------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    10 |    22  (41)|
|   1 |  NESTED LOOPS ANTI    |       |     1 |    10 |    22  (41)|
|   2 |   INDEX FAST FULL SCAN| T1_UK | 20000 |    97K|    14   (8)|
|*  3 |   INDEX UNIQUE SCAN   | T2_UK | 19998 | 99990 |     0   (0)|
--------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("ID"="ID")

Statistics
----------------------------------------------------------
        352  recursive calls
          0  db block gets
      20112  consistent gets
          0  physical reads
          0  redo size
        406  bytes sent via SQL*Net to client
        380  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         10  sorts (memory)
          0  sorts (disk)
          1  rows processed

The new plan shows the anti-join as expected, and the query completes in less than 1 second. And the only thing we really did was give Oracle some additional information about the data. This demonstrates the necessity of giving the optimizer as much information about our data as possible, because it just may open up some additional plans.

In case we are not able to (or allowed to) modify the table structure, there is another way of making this information available to the optimizer via the query itself. The query can be rewritten like this:

SELECT id
  FROM t1
 WHERE id IS NOT NULL
   AND id NOT IN(SELECT id 
                     FROM t2
                    WHERE id IS NOT NULL);

Notice the extra predicates specifying ID is not null. While this query is not logically equivalent to the first one (due to the way nulls work), they return the same result, since there are no nulls in any of the tables. By explicitly specifying that in the query, we cleared the way for the optimizer to pick the appropriate join mechanism.

Finally, the query could of course be written as NOT EXISTS, which doesn’t have any of the restrictions described above.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">