Thursday, July 7, 2011

Wrong results with 'hash group by' aggregation on 10.2.0.2

If you are running 10.2.0.2 in your production environment check document [ID 4604970.8] on Oracle support site.


This week colleague noticed that query, he wanted to use for some reports, returns different results than he expected. He made some additional checks using Microsof Excel and confirmed that something is wrong. He was getting incorrect results.

My first suspicion was that probably something is wrong with query or maybe logical corruption happened.

Example of the queries (I’ve changed names of the table/columns):

-- partitioned_table (date range partitioned table)

create table temp_tab_2011
as
select col, date_to,  sum (col1) col1, 
       sum (col2) f_col1, sum (col3) col3, 
    sum (col4) col4, 
    sum (col5) col5, count(*) cnt
from partitioned_table t
where t.date_to >= to_date ('01.02.2011 00:00:00', 'dd.mm.yyyy HH24:MI:SS')
    and t.date_to < to_date ('01.07.2011 00:00:00', 'dd.mm.yyyy HH24:MI:SS')
group by col, date_to
/

create table temp_tab_2010_2011
as
select col, date_to, sum (col1) col1, 
       sum (f_col1) f_col1, sum (col3) col3, 
    sum (col4) col4, 
    sum (col5) col5, count(*) cnt
from partitioned_table t
where t.date_to >= to_date ('01.02.2010 00:00:00', 'dd.mm.yyyy HH24:MI:SS')
    and t.date_to < to_date ('01.07.2011 00:00:00', 'dd.mm.yyyy HH24:MI:SS')
group by col, date_to
/

-- COMPARASION QUERY
select j.col, j.col1, j.col2, j.col3, j.cnt, d.col, d.col1, d.col2, d.col3, d.cnt, j.col1 - d.col1
from
temp_tab_2011 j, temp_tab_2010_2011 d
where 1=1
and j.col = d.col
and j.date_to = d.date_to
and j.date_to = to_date ('01.06.2011 00:00:00', 'dd.mm.yyyy HH24:MI:SS')
and d.date_to = to_date ('01.06.2011 00:00:00', 'dd.mm.yyyy HH24:MI:SS')
and (j.col1 - d.col1) <> 0
/


We expected 0 rows returned for comparison query but we received some rows - which was serious issue.



I’ve restored that database on another machine. Checked files for logical corruption, rebuilded indexes into another tablespace, rebuilded table into another tablespace and re-created temporary tablespace - but still we were receiving wrong results.


Luckily for us I’ve noticed this blog post during my searches for answer on Google:
http://www.oracloid.com/2006/05/hash-group-by-can-give-wrong-result-in-oracle-102/

Queries were using this plan:
(click "view source" to get better output)
----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name                          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT               |                               |       |       |       | 43069 (100)|          |       |       |
|   1 |  LOAD AS SELECT                      |                               |       |       |       |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR           |                               |   631K|    50M|       | 39729   (1)| 00:13:08 |    57 |    74 |
|   3 |    HASH GROUP BY                     |                               |   631K|    50M|   219M| 39729   (1)| 00:13:08 |       |       |
|   4 |     TABLE ACCESS BY LOCAL INDEX ROWID| PARTITIONED_TABLE             |  1557K|   124M|       | 20999   (1)| 00:06:56 |    57 |    74 |
|*  5 |      INDEX RANGE SCAN                | IX_DATE                       |  1557K|       |       |  3874   (1)| 00:01:17 |    57 |    74 |
----------------------------------------------------------------------------------------------------------------------------------------------

In our version 10.2.0.2 wrong results are possible from the result of a non-distinct aggregation with a group by when HASH GROUB BY is used. We were hitting that bug.


In session level we changed undocumented parameter "_gby_hash_aggregation_enabled" to FALSE and executed queries again. Result was as expected - 0 rows returned.

It is noted in Metalink note that this issue is fixed in 10.2.0.3 version so I’ve upgraded test instance to 10.2.0.4 version and that solved problems with incorrect results.


This is nasty Oracle bug and if you are running 10.2.0.1 or 10.2.0.2 check your results :)


0 Comments:

Post a Comment