Tuesday, February 18, 2014

How to use index to get NULL values

I have learned new trick today which I want to share and keep as reminder to myself. I am sure that most Oracle DBA’s or Developers are familiar with this - so please skip this post if you are one of them :)


My test table will have 1000000 rows with 1000 NULL values for “OBJECT_NAME” column.

Tests are performed on 11gR1 version.

Table/column info:
select c.table_name, c.column_name, c.data_type, c.num_nulls, 
       t.num_rows table_num_rows
from dba_tab_columns c, dba_tables t
where 1=1
and c.table_name = t.table_name
and c.table_name = 'OBJECTS'
and c.column_name = 'OBJECT_NAME';



TABLE_NAME    COLUMN_NAME     DATA_TYPE     NUM_NULLS TABLE_NUM_ROWS
------------- --------------- ------------ ---------- --------------
OBJECTS       OBJECT_NAME     VARCHAR2           1000        1000000 


Goal:
Count how many OBJECT_NAMEs are NULL.


Index object_name column using classic B-tree index.

create index idx_objname on objects(object_name);


select count(*)
from objects
where object_name is null;

  COUNT(*)
----------
      1000

------------------------------------------------------------------------------
| Id  | Operation          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |         |       |       |  4252 (100)|          |
|   1 |  SORT AGGREGATE    |         |     1 |    25 |            |          |
|*  2 |   TABLE ACCESS FULL| OBJECTS |  1000 | 25000 |  4252   (1)| 00:00:52 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OBJECT_NAME" IS NULL)
   
   
drop index idx_objname;

FULL SCAN is used because NULL values are not indexed for single column indexes.


For a workaround you could create function based index to help yourself. This was (till now) my preferred option.

create index idx_objname_f on objects(nvl(object_name,'XX'));

select count(*)
from objects
where nvl(object_name,'XX')='XX';

  COUNT(*)
----------
      1000

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |       |       |     3 (100)|          |
|   1 |  SORT AGGREGATE   |               |     1 |    17 |            |          |
|*  2 |   INDEX RANGE SCAN| IDX_OBJNAME_F | 10000 |   166K|     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECTS"."SYS_NC00017$"='XX')

Now INDEX is used but as you can see I had to modify initial query using function to get number of NULLs.

Besides that, this index is unusable if I want to get other values then NULL.

select count(*)
from objects
where object_name='TEST';

  COUNT(*)
----------
        16


------------------------------------------------------------------------------
| Id  | Operation          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |         |       |       |  4253 (100)|          |
|   1 |  SORT AGGREGATE    |         |     1 |    25 |            |          |
|*  2 |   TABLE ACCESS FULL| OBJECTS |    28 |   700 |  4253   (1)| 00:00:52 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OBJECT_NAME"='TEST')
   
      
drop index idx_objname_f;


The trick that I wanted to show.
In composite index NULL values are stored for leading column when second column is not NULL. I will use dummy value for second column.

create index idx_objname_d on objects(object_name,'X');

select count(*)
from objects
where object_name is null;

  COUNT(*)
----------
      1000

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |       |       |     8 (100)|          |
|   1 |  SORT AGGREGATE   |               |     1 |    25 |            |          |
|*  2 |   INDEX RANGE SCAN| IDX_OBJNAME_D |  1000 | 25000 |     8   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_NAME" IS NULL)


Index is used to count NULL values.


You can also use this index to get other values.


select count(*)
from objects
where object_name='TEST';

  COUNT(*)
----------
        16

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |       |       |     3 (100)|          |
|   1 |  SORT AGGREGATE   |               |     1 |    25 |            |          |
|*  2 |   INDEX RANGE SCAN| IDX_OBJNAME_D |    28 |   700 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_NAME"='TEST')


drop index idx_objname_d;

This trick was so nice discovery for me that deserved blog post.


It is worth to mention that bitmap index also stores NULL values.

create bitmap index idx_objname_d on objects(object_name);

select count(*)
from objects
where object_name is null;

  COUNT(*)
----------
      1000
   
---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |       |       |     3 (100)|          |
|   1 |  SORT AGGREGATE             |               |     1 |    25 |            |          |
|   2 |   BITMAP CONVERSION COUNT   |               |  1000 | 25000 |     3   (0)| 00:00:01 |
|*  3 |    BITMAP INDEX SINGLE VALUE| IDX_OBJNAME_D |       |       |            |          |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("OBJECT_NAME" IS NULL)



Clean up!

drop table objects;

1 comment: