Friday, September 14, 2012

Wrong result after using join predicate push into a view with a GROUP BY

Yesterday colleague reported that he’s receiving wrong/odd result from query so we decided to examine closely what's happening. Query had more then 150 lines with inline view, group by clause, several outer joins, function, decodes and cases included. So the first step was to exclude all the stuff that wasn’t important for clearer diagnosis.

Environment: Oracle EE 11.1.0.7 on OEL 5.7


I will post shortened version of the query with fake table names and columns.


SELECT
   p.name p_name,
   r.status r_status
   FROM table_1 r,
   (          
       SELECT  ri.r_id,            
       MAX ( DECODE (ri.c_name,  'SPACE', ri.i_val, NULL)) SPACE,
       MAX ( DECODE (ri.c_name,  'UNIVERSE', ri.i_val, NULL)) UNIVERSE,
       MAX ( DECODE (ri.c_name,  'ALIEN', ri.i_val, NULL)) ALIEN,
       MAX ( DECODE (ri.c_name,  'MUSIC', ri.i_val, NULL)) MUSIC,
       MAX ( DECODE (ri.c_name,  'GUITARS', ri.i_val, NULL)) GUITARS
       FROM table_7 ri                 
                  WHERE ri.r_id IN (SELECT r2.r_id FROM table_1 r2 WHERE  r2.r_id > 1233251 and r2.p_id in (111, 222))
   GROUP BY ri.r_id        
   ) ri,
   table_2 a,
   table_3 cu,
   table_4 c,
   table_5 p,
   table_6 rc,
   table_6 rc2,
   table_6 rc3,
   table_6 rc4,
   table_6 rc5
WHERE     1=1
   AND r.r_id = a.r_id
   AND r.r_id = ri.r_id  
   AND cu.table_3_ID = r.table_3_ID
   AND c.table_4_ID = r.table_4_id
   AND r.p_id = p.p_id
   AND r.p_id in (111, 222)
   AND c.status in ('DA', 'NE', 'A')        
   AND rc.lang(+) = 'BLAH'
   AND rc.domain(+) = 'SPACE'
   AND rc.code(+) = ri.SPACE
   AND rc2.lang(+) = 'BLAH'
   AND rc2.domain(+) = 'MARS'
   AND rc2.code(+) = ri.UNIVERSE
   AND rc3.lang(+) = 'BLAH'
   AND rc3.domain(+) = 'SPEED'
   AND rc3.code(+) = ri.GUITARS
   AND rc4.lang(+) = 'BLAH'
   AND rc4.domain(+) = 'ALIEN'
   AND rc4.code(+) = ri.ALIEN
   AND rc5.lang(+) = 'BLAH'
   AND rc5.domain(+) = 'MUSIC'
   AND rc5.code(+) = ri.MUSIC


Imagine this case.
You execute this query and you receive 114 rows, but when you exclude “p_name” and just ask for one column “r_status” you receive 15 rows.

After few hours of analysis we found possible cause for this behavior - “Bug 8447623 Wrong Results with OUTER joins and tables with Check Constraints”. Workaround for this bug was setting undocumented parameter "_optimizer_join_elimination_enabled" to false. Quick look on Oracle support site showed that “_optimizer_join_elimination_enabled” introduced many bugs so we were pretty sure that this is causing problems.

Altering “_optimizer_join_elimination_enabled” from true to false (on session level) changed result from 15 to 114 rows when asking "r_status" column. We were receiving stable result (114 rows) in any case. But we weren't completely satisfied with receiving this result so we continued with deeper data analysis.

It showed that actually 15 rows was the correct result and with altering “_optimizer_join_elimination_enabled” to false we were continuously receiving wrong result. So we left default value for that parameter and continued with investigation.

After some while of searching and testing I've found note - “Bug 9121708 Wrong result using view pushed predicate and OUTER join”.

This was the root cause. Oracle 11.1.0.7 version was affected and we had query using join predicate push into a view with a GROUP BY. Changing "_push_join_predicate" to false was the correct workaround. We were receiving correct 15 rows in any case.

To avoid similar cases happening in the future we should upgrade database to 11.2.0.2 or disable “_push_join_predicate” on instance level. But for now we decided to do none of that.

We changed our query a little to avoid pushing predicates into a view with group by and everything worked as it should.

Push join predicate is very powerful feature which enables pushing join predicate inside a view. It is not very wise to disable this feature on instance level without further testing because it can significantly affect performance.
As we noticed that there aren’t many bugs associated to this feature, we decided that it is somewhat safe to leave it enabled.


The thing that scares me the most is the amount of "wrong result" bugs in newer Oracle versions. It is very difficult to detect such bug. When developer or application user notices wrong/odd result Oracle is very last to blame. Who knows how many reports or calculations are there with undetected wrong results. It is little scary, isn't it :)

With newer versions Oracle introduces super new features that speed up execution of the query, but with this features new “wrong result” bugs are introduced also.


Don't get me wrong, I'm OK with that. Oracle is software like any other so bugs are expected. Our job (as DBA’s) is to catch this bugs and correct them or at least avoid them.


4 comments:

  1. >The thing that scares me the most is the amount of "wrong result" bugs in newer Oracle versions
    +1

    ReplyDelete
  2. Dominic, thanks for commenting.

    Regards,
    Marko

    ReplyDelete
  3. The thing that scares me the most is the amount of "wrong result" bugs in newer Oracle versions
    +1

    Who knows how many clients didn't get a credit from their bank due to such bugs, or worse...

    ReplyDelete
  4. What can I say, it's better to not think too much about "wrong result" bugs ;-)

    Regards,
    Marko

    ReplyDelete