Tuesday, February 22, 2011

"SQL*Net message to/from dblink" - How to improve query performance over dblink?

Colleague reported to me that execution of his query is taking too long. After quick look it was obvious that this was ideal tunning candidate.
I will note my tuning process in this blog post.

Table and column names are changed.

DB: Oracle 10.2.0.2.0
OS: Solaris 9 64-bit

Query:
SELECT
       ROUND (
          SUM (poz_traj) / 60,2)
          mins,
       SUM (amount) iznos,
       COUNT (*) pozivi
  FROM 
       (SELECT co.con_nr
          FROM usr.ugov@remote_db co,
               usr.ugov_item@remote_db ci,
               usr.np_zah@remote_db np
         WHERE     1 = 1
               AND co.ugov_id = ci.ugov_id
               AND ci.ugov_item_id = np.ugov_item_id
               AND NP.kol_id IN
                      (355375, 360996, 424469, 466193, 466194, 480438, 492748)
               AND np.u_po IS NOT NULL
               AND status_cd = 'PRI'
               AND co.status <> 'CC') c,
             table1 x
 WHERE 1 = 1 
       AND x.sub_cd = c.con_nr
       AND record_date >=
              TO_DATE ('24.01.2011 08:40:01', 'dd.mm.yyyy hh24:mi:ss')
       AND record_date <
              TO_DATE ('24.01.2011 18:00:01', 'dd.mm.yyyy hh24:mi:ss')
       AND orgn = 0                                     
       AND item_cd = 1                      
       AND corr_status > -1
       AND parameter_sg = 'glas';



This query is using database link to access tables on remote database - "REMOTE_DB".

Explain plan showed:

NOTE!
Click "view source" to get explain plan better formatted.
----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name          | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop | Inst   |IN-OUT|
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |               |       |       |   105 (100)|          |       |       |        |      |
|   1 |  SORT AGGREGATE                        |               |     1 |   275 |            |          |       |       |        |      |
|   2 |   NESTED LOOPS                         |               |     1 |   275 |   105   (0)| 00:00:03 |       |       |        |      |
|   3 |    NESTED LOOPS                        |               |    32 |  3264 |   103   (0)| 00:00:03 |       |       |        |      |
|   4 |     NESTED LOOPS                       |               |    16 |  1216 |    55   (0)| 00:00:02 |       |       |        |      |
|   5 |      PARTITION RANGE SINGLE            |               |    16 |   704 |    23   (0)| 00:00:01 |    37 |    37 |        |      |
|*  6 |       TABLE ACCESS BY LOCAL INDEX ROWID| TABLE1        |    16 |   704 |    23   (0)| 00:00:01 |    37 |    37 |        |      |
|*  7 |        INDEX RANGE SCAN                | IX_TABLE1_RD  |    21 |       |     3   (0)| 00:00:01 |    37 |    37 |        |      |
|   8 |      REMOTE                            | UGOV          |     1 |    32 |     2   (0)| 00:00:01 |       |       | REMO_~ | R->S |
|   9 |     REMOTE                             | UGOV_ITEM     |     2 |    52 |     3   (0)| 00:00:01 |       |       | REMO_~ | R->S |
|  10 |    REMOTE                              | NP_ZAH        |     1 |   173 |     1   (0)| 00:00:01 |       |       | REMO_~ | R->S |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - filter(("item_cd"=1 AND "parameter_sg"='glas' AND "orgn"=0 AND "corr_status">(-1)))
   7 - access("RECORD_DATE">=TO_DATE('2011-01-24 08:40:01', 'yyyy-mm-dd hh24:mi:ss') AND "RECORD_DATE"<TO_DATE('2011-01-24
              18:00:01', 'yyyy-mm-dd hh24:mi:ss'))


To help myself a little I’ve checked how many rows there are in tables queried in inline view.

ugov - 777802 rows
ugov_item - 1581553 rows
np_zah - 155802 rows

SQL>  SELECT co.con_nr
  2            FROM usr.ugov@remote_db co,
  3                 usr.ugov_item@remote_db ci,
  4                 usr.np_zah@remote_db np
  5           WHERE     1 = 1
  6                 AND co.ugov_id = ci.ugov_id
  7                 AND ci.ugov_item_id = np.ugov_item_id
  8                 AND NP.kol_id IN
  9                        (355375, 360996, 424469, 466193, 466194, 480438, 492748)
 10                 AND np.u_po IS NOT NULL
 11                 AND status_cd = 'PRI'
 12                 AND co.status <> 'CC';

4599 rows selected.

Elapsed: 00:00:23.60

Execution of inline view was satisfying with 4599 rows received.


After this initial checks it was obvious where problem lies. Oracle probably transfers whole tables over network from remote database and then resolves the query.

To confirm my thoughts I’ve decided to trace execution of the current query and check wait events.

Results were:
SQL> SELECT
  2         ROUND (
  3            SUM (poz_traj) / 60,
  4            2)
  5            mins,
  6         SUM (amount) iznos,
  7         COUNT (*) pozivi
  8    FROM
  9         (SELECT co.con_nr
 10            FROM usr.ugov@remote_db co,
 11                 usr.ugov_item@remote_db ci,
 12                 usr.np_zah@remote_db np
 13           WHERE     1 = 1
 14                 AND co.ugov_id = ci.ugov_id
 15                 AND ci.ugov_item_id = np.ugov_item_id
 16                 AND NP.kol_id IN
 17                        (355375, 360996, 424469, 466193, 466194, 480438, 492748)
 18                 AND np.u_po IS NOT NULL
 19                 AND status_cd = 'PRI'
 20                 AND co.status <> 'CC') c,
 21               table1 x
 22   WHERE 1 = 1
 23         AND x.sub_cd = c.con_nr
 24         AND record_date >=
 25                TO_DATE ('24.01.2011 08:40:01', 'dd.mm.yyyy hh24:mi:ss')
 26         AND record_date <
 27                TO_DATE ('24.01.2011 18:00:01', 'dd.mm.yyyy hh24:mi:ss')
 28         AND orgn = 0
 29         AND item_cd = 1
 30         AND corr_status > -1
 31         AND parameter_sg = 'glas';

1 row selected.

Elapsed: 01:22:23.17

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to dblink                 3055416        0.00          1.55
  SQL*Net message from dblink               3055416        2.73       3835.51
  db file sequential read                     34051        0.51        226.57
  SQL*Net message from client                     2      323.35        323.35
  SQL*Net message to client                       1        0.00          0.00


Time of query execution was 1:22:23.

Top wait events were "SQL*Net message to dblink" and "SQL*Net message from dblink".

I’ve had network bottleneck because query produced lot of traffic between databases.


To resolve this problem I have to minimize data transfer from remote database.
Inline view should be executed on remote database to cut down number of rows that are transferred over dblink.


So I’ve created view on remote database and checked execution again.

REMOTE DATABASE

create view temp_view_test as
  SELECT co.con_nr
          FROM usr.ugov co,
               usr.ugov_item ci,
               usr.np_zah np
         WHERE     1 = 1
               AND co.ugov_id = ci.ugov_id
               AND ci.ugov_item_id = np.ugov_item_id
               AND NP.kol_id IN
                      (355375, 360996, 424469, 466193, 466194, 480438, 492748)
               AND np.u_po IS NOT NULL
               AND status_cd = 'PRI'
               AND co.status <> 'CC';


What will happen now:

SQL > SELECT
  2         ROUND (
  3            SUM (poz_traj) / 60,
  4            2)
  5            mins,
  6         SUM (amount) iznos,
  7         COUNT (*) pozivi
  8    FROM
  9         usr.temp_view_test@remote_db c,
 10               table1 x
 11   WHERE 1 = 1
 12         AND x.sub_cd = c.con_nr
 13         AND record_date >=
 14                TO_DATE ('25.01.2011 08:40:01', 'dd.mm.yyyy hh24:mi:ss')
 15         AND record_date <
 16                TO_DATE ('25.01.2011 18:00:01', 'dd.mm.yyyy hh24:mi:ss')
 17         AND orgn = 0
 18         AND item_cd = 1
 19         AND corr_status > -1
 20         AND parameter_sg = 'glas';

1 row selected.

Elapsed: 00:02:55.53

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name            | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop | Inst   |IN-OUT|
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                 |       |       |   625 (100)|          |       |       |        |      |
|   1 |  SORT AGGREGATE                      |                 |     1 |    56 |            |          |       |       |        |      |
|*  2 |   HASH JOIN                          |                 |     8 |   448 |   625   (1)| 00:00:13 |       |       |        |      |
|   3 |    PARTITION RANGE SINGLE            |                 |    16 |   704 |    23   (0)| 00:00:01 |    37 |    37 |        |      |
|*  4 |     TABLE ACCESS BY LOCAL INDEX ROWID| TABLE1          |    16 |   704 |    23   (0)| 00:00:01 |    37 |    37 |        |      |
|*  5 |      INDEX RANGE SCAN                | IX_TABLE1_RD    |    21 |       |     3   (0)| 00:00:01 |    37 |    37 |        |      |
|   6 |    REMOTE                            | TEMP_VIEW_TEST  | 51418 |   602K|   601   (1)| 00:00:12 |       |       | REMO_~ | R->S |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("X"."sub_cd"="C"."con_nr")
   4 - filter(("item_cd"=1 AND "parameter_sg"='glas' AND "orgn"=0 AND "corr_status">(-1)))
   5 - access("RECORD_DATE">=TO_DATE('2011-01-24 08:40:01', 'yyyy-mm-dd hh24:mi:ss') AND "RECORD_DATE"<TO_DATE('2011-01-24
              18:00:01', 'yyyy-mm-dd hh24:mi:ss'))



This was significant performance improvement. From 1 hour 22 mins to 2 mins 55 secs !

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                     27932        0.27         41.32
  SQL*Net message to dblink                       2        0.00          0.00
  SQL*Net message from dblink                     2       42.79         44.46
  SQL*Net more data from dblink                  36        3.24         65.83
  SQL*Net message from client                     2        0.01          0.01
  SQL*Net message to client      

Top wait events weren't related to network bottlenecks.

So the trick was to create view on remote database which reduced number of rows transferred over network.

Colleague was happy with new execution time :)


11 comments:

  1. Hi Marko,

    Nice article on db_link performance.
    I am wondering have you tried with "DRIVING_SITE" hint.

    R,
    Ganesh

    ReplyDelete
  2. Hi Ganesh,

    DRIVING_SITE hint was my first choice but it didn't helped me much.

    Can't remember why, but query performance was still lousy.

    Regards,
    Marko

    ReplyDelete
  3. Hi! when a think about a query which works with XML I consider this:
    http://runakay.blogspot.com/2011/12/speeding-up-your-querys-with-xpath.html

    ReplyDelete
  4. Hi Marko,

    We are facing same kind of issue in our prod environment

    Please can you explain me how did you get the explain plan for the query. Because when i am trying to get the explain plan its only showing two rows
    1)select
    2)remote operation.

    Thanks
    Jamsher

    ReplyDelete
  5. Hello Jamsher,

    maybe you have only two lines in explain plan because query is executed on remote node (remote operation).

    There are several ways to create explain plans. Just google "Oracle Explain Plan" and you'll get plenty guidelines.

    These are good examples:
    http://tinyurl.com/8lrc9nh
    http://tinyurl.com/8fopgch

    Regards,
    Marko

    ReplyDelete
  6. Thanks Marko for reply

    One more question every time its like oracle transfer complete blocks to db where remote query execute or there are some specific condition causes this.

    ReplyDelete
  7. Hello Jamsher,

    query is executed on primary database so you must transfer data from remote database to primary.

    When remote database holds majority of the data it is better to execute query on remote database and then transfer data to primary.

    Check DRIVING_SITE hint.

    Regards,
    Marko

    ReplyDelete
  8. Thanks A ALot, this has resolved our Sev1 problem
    Thanks,
    Ajay More
    http://moreajays.blogspot.com

    ReplyDelete
  9. Hello Ajay,

    I'm glad that this post helped you to solve Sev1 performance problem.

    I've checked your blog. You have interesting topics there. Keep on good work!

    Regards,
    Marko

    ReplyDelete