Slow full table scan due to row chaining

Wednesday, February 24, 2016 3 comments
Few days ago I’ve received complaint that simple count on 2 million rows table is running forever.

This was the statement:
select count(1)
from CLIENT k
where k.expires is null;

I've used fake names for table name and columns.
Database version: 11.2.0.4.0

Indeed, query was running longer than I would expect. Oracle was using FULL SCAN of the table with "db file sequential read" wait events. This was little odd to me as I would expect "direct path reads" or "db file scattered reads".


It was partitioned table with 4 partitions and 294 columns.

select count(*) from dba_tab_columns where table_name = 'CLIENT';
 
 COUNT(*)
----------
      294

select owner, segment_name, partition_name, bytes, blocks
from dba_segments
where segment_name in ('CLIENT');
 
OWNER      SEGMENT_NAME    PARTITION_NAME            BYTES     BLOCKS
---------- --------------- -------------------- ---------- ----------
SCOTT       CLIENT         CLIENT_OTHER          8388608        1024
SCOTT       CLIENT         CLIENT_CITY           1643118592     200576
SCOTT       CLIENT         CLIENT_CNTR           591396864      72192
SCOTT       CLIENT         CLIENT_STRNG          52428800       6400


select table_name, partition_name, NUM_ROWS, AVG_ROW_LEN
from dba_tab_partitions
where table_name='CLIENT';

TABLE_NAME                     PARTITION_NAME           NUM_ROWS    AVG_ROW_LEN
------------------------------ ----------------------- ----------- ---------------
CLIENT                          CLIENT_OTHER             0            0
CLIENT                          CLIENT_CITY              1469420      572
CLIENT                          CLIENT_CNTR              592056       495
CLIENT                          CLIENT_STRNG             48977        565


select table_name, data_type, count(*)
from dba_tab_cols
where table_name='CLIENT'
group by table_name, data_type
order by 3 desc;
 
TABLE_NAME DATA_TYPE                                  COUNT(*)
---------- ---------------------------------------- ----------
CLIENT   NUMBER                                          191
CLIENT   VARCHAR2                                         70
CLIENT   DATE                                             32
CLIENT   TIMESTAMP(6)                                      3
CLIENT   RAW                                               2
CLIENT   CL_UTR                                            1
CLIENT   O_TIP_KAR                                         1
CLIENT   O_ZA_NA                                           1
CLIENT   O_PO_OSO                                          1

Some of the columns were collections.

select type_name, typecode
from dba_types
where type_name in (select data_type
                   from dba_tab_cols
                   where table_name='CLIENT'
                   and data_type not in ('NUMBER','VARCHAR2',
                   'DATE','TIMESTAMP(6)','RAW'));
 
TYPE_NAME                      TYPECODE                     
------------------------------ ------------------------------
CL_UTR                         COLLECTION                    
O_TIP_KAR                      COLLECTION                    
O_ZA_NA                        COLLECTION                    
O_PO_OSO                       COLLECTION                    

These were varrays used to store multivalued attributes.


In trace I've seen lots disk reads and elapsed time over 2400 seconds.

select count(1)
  from CLIENT k
where k.expires is null

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2    203.96    2450.19    5455717    8240323          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4    203.97    2450.20    5455717    8240323          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 369  (MSUTIC)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=8240323 pr=5455717 pw=0 time=1349733885 us)
   1905617    1905617    1905617   PARTITION LIST ALL PARTITION: 1 4 (cr=8240323 pr=5455717 pw=0 time=2449532855 us cost=164110 size=3801914 card=1900957)
   1905617    1905617    1905617    TABLE ACCESS FULL CLIENT PARTITION: 1 4 (cr=8240323 pr=5455717 pw=0 time=2448530798 us cost=164110 size=3801914 card=1900957)


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   MODE: ALL_ROWS
      1   SORT (AGGREGATE)
1905617    PARTITION LIST (ALL) PARTITION: START=1 STOP=4
1905617     TABLE ACCESS   MODE: ANALYZED (FULL) OF 'CLIENT' (TABLE) 
                PARTITION: START=1 STOP=4

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to CLIENT                       2        0.00          0.00
  Disk file operations I/O                       29        0.00          0.00
  direct path read                             2048        0.19          9.78
  db file sequential read                   5178860        0.23       2241.08
  resmgr:internal state change                    2        0.11          0.21
  SQL*Net message from CLIENT                     1        0.00          0.00

Object statistics were telling me that all reads were from table partitions.

Session Objects Statistics
Object/Event % Time Seconds Calls - Time per Call -
Avg Min Max
Obj#(299564)
    db file sequential read 78.1% 1,757.0600s 3,677,752 0.0005s 0.0001s 0.2333s
    direct path read 0.4% 8.8314s 1,706 0.0052s 0.0004s 0.1953s
    resmgr:internal state change 0.0% 0.2162s 2 0.1081s 0.1000s 0.1162s
    Disk file operations I/O 0.0% 0.0014s 23 0.0001s 0.0000s 0.0002s
Obj#(299565)
    db file sequential read 20.5% 462.5006s 1,416,370 0.0003s 0.0001s 0.1794s
    direct path read 0.0% 0.8966s 304 0.0029s 0.0001s 0.0479s
    Disk file operations I/O 0.0% 0.0003s 6 0.0000s 0.0000s 0.0000s
Obj#(299566)
    db file sequential read 1.0% 21.5203s 84,738 0.0003s 0.0001s 0.0552s
    direct path read 0.0% 0.0587s 38 0.0015s 0.0000s 0.0206s


Hm… why am I having so many db file sequential reads with direct path reads happening also?
This is a table with lots of columns so I might have problems with chained or migrated rows.
Oracle is probably using individual block reads to fetch pieces of each row.

As I had table with more than 255 columns I would expect intra-block chaining, but this shouldn't cause sequential reads. Only if row doesn’t fit in the block I would have regular row chaining.
I’m probably having problems with row migrations.

Chained row is a row that is too large to fit into a block and if this is the root cause of the problem there isn't much I can do to improve performance. If it’s too big to fit into a block, it would be too big after rebuilding table also.

Migration of an row occurs when row is updated in a block and amount of free space in the block is not adequate to store all the row’s data. Row is migrated to another physical block.
This usually happens when you have PCTFREE set to low.

What is important for migrated rows - you can improve performance reorganizing table/partition or simply deleting/inserting chained rows.

Tanel wrote blog post on the subject "Detect chained and migrated rows in Oracle – Part 1” and I’ve decided to use his great tool Snapper to get some diagnostic info.

SQL> @sn 60 6596
@snapper all 60 1 "6596"
Sampling SID 6596 with interval 60 seconds, taking 1 snapshots...
 
-- Session Snapper v4.06 BETA - by Tanel Poder ( http://blog.tanelpoder.com ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)
 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  6596, MSUTIC    , STAT, session logical reads                                     ,        283813,      4.74k,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , STAT, user I/O wait time                                        ,          5719,      95.46,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , STAT, non-idle wait time                                        ,          5719,      95.46,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , STAT, non-idle wait count                                       ,        193388,      3.23k,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , STAT, session pga memory                                        ,         -8400,    -140.21,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , STAT, enqueue requests                                          ,             2,        .03,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , STAT, enqueue releases                                          ,             2,        .03,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , STAT, physical read total IO requests                           ,        193740,      3.23k,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , STAT, physical read total multi block requests                  ,           353,       5.89,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , STAT, physical read total bytes                                 ,    1630494720,     27.21M,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , STAT, cell physical IO interconnect bytes                       ,    1630494720,     27.21M,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , STAT, consistent gets                                           ,        283812,      4.74k,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , STAT, consistent gets direct                                    ,        283810,      4.74k,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , STAT, physical reads                                            ,        199034,      3.32k,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , STAT, physical reads direct                                     ,        199034,      3.32k,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , STAT, physical read IO requests                                 ,        193739,      3.23k,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , STAT, physical read bytes                                       ,    1630486528,     27.21M,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , STAT, file io wait time                                         ,      57195780,    954.66k,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , STAT, Number of read IOs issued                                 ,           353,       5.89,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , STAT, no work - consistent read gets                            ,        283808,      4.74k,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , STAT, table scan rows gotten                                    ,       2881106,     48.09k,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , STAT, table scan blocks gotten                                  ,         83578,       1.4k,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , STAT, table fetch continued row                                 ,        200188,      3.34k,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , STAT, buffer is not pinned count                                ,        200226,      3.34k,         ,             ,          ,           ,          ~ per execution
  6596, MSUTIC    , TIME, DB CPU                                                    ,       5620720,    93.82ms,     9.4%, [@         ],          ,           ,
  6596, MSUTIC    , TIME, sql execute elapsed time                                  ,      60270147,      1.01s,   100.6%, [##########],          ,           ,
  6596, MSUTIC    , TIME, DB time                                                   ,      60270147,      1.01s,   100.6%, [##########],          ,           ,          ~ unaccounted time
  6596, MSUTIC    , WAIT, Disk file operations I/O                                  ,           123,     2.05us,      .0%, [          ],         2,        .03,     61.5us average wait
  6596, MSUTIC    , WAIT, db file sequential read                                   ,      57234629,   955.31ms,    95.5%, [WWWWWWWWWW],    192888,      3.22k,   296.72us average wait
 
--  End of Stats snap 1, end=2016-02-23 13:23:19, seconds=59.9
 
----------------------------------------------------------------------------------------------------
Active% | INST | SQL_ID          | SQL_CHILD | EVENT                               | WAIT_CLASS
----------------------------------------------------------------------------------------------------
   97% |    1 | 2q92xdvxjj712   | 0         | db file sequential read             | User I/O
    3% |    1 | 2q92xdvxjj712   | 0         | ON CPU                              | ON CPU
 
--  End of ASH snap 1, end=2016-02-23 13:23:19, seconds=60, samples_taken=99
 
PL/SQL procedure successfully completed.

Notice "table fetch continued row" statistic. Tanel wrote that this counter is usually increasing when rows are accessed with index access paths.
In my case I have full scan that is increasing the value. This count is number of chained pieces Oracle had to go through in order to find the individual pieces of the rows.
I won’t go any further in detail - just check Tanel’s blog post.


Let’s identify chained rows running ANALYZE command with the LIST CHAINED ROWS option. This command will collect information about each migrated or chained row.

SQL> analyze table SCOTT.CLIENT list chained rows;
 
Table analyzed.
 
SQL> select count(*) from chained_rows;
 
 COUNT(*)
----------
  2007045

SQL> select partition_name, count(*) from chained_rows group by partition_name;
 
PARTITION_NAME                   COUNT(*)
------------------------------ ----------
CLIENT_CITY                       1411813
CLIENT_CNTR                       552873
CLIENT_STRNG                      42359

Table with 2097647 rows has 2007045 chained/migrated rows. This was causing so many reads for simple full scan of the small table.


I have decided to rebuild table partitions without changing PCTFREE parameter to fit migrated rows into a single block.


After rebuild number of chained rows decreased.

SQL> analyze table SCOTT.CLIENT list chained rows;
 
Table analyzed.
 
SQL> select count(*) from chained_rows;
 
 COUNT(*)
----------
    37883

Now query finished in 14 secs without sequential reads happening.
select  count(1)
  from CLIENT k
  where k.expires is null

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      2.34      13.96     185802     185809          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      2.34      13.96     185802     185809          0           1

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 369  (MSUTIC)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=185809 pr=185802 pw=0 time=13965941 us)
   1905617    1905617    1905617   PARTITION LIST ALL PARTITION: 1 4 (cr=185809 pr=185802 pw=0 time=13560379 us cost=109526 size=3811234 card=1905617)
   1905617    1905617    1905617    TABLE ACCESS FULL CLIENT PARTITION: 1 4 (cr=185809 pr=185802 pw=0 time=12848619 us cost=109526 size=3811234 card=1905617)


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   MODE: ALL_ROWS
      1   SORT (AGGREGATE)
1905617    PARTITION LIST (ALL) PARTITION: START=1 STOP=4
1905617     TABLE ACCESS   MODE: ANALYZED (FULL) OF 'CLIENT' (TABLE) 
                PARTITION: START=1 STOP=4

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to CLIENT                       2        0.00          0.00
  direct path read                             3569        0.11          8.99
  SQL*Net message from CLIENT                     2        0.00          0.01


Snapper also showed that I don’t have problem with row chaining.
SQL> @sn 15 6601
@snapper all 15 1 "6601"
Sampling SID 6601 with interval 15 seconds, taking 1 snapshots...
 
-- Session Snapper v4.06 BETA - by Tanel Poder ( http://blog.tanelpoder.com ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)
 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 SID, USERNAME  , TYPE, STATISTIC                                                 ,      DELTA, HDELTA/SEC, %TIME, GRAPH    , NUM_WAITS,  WAITS/SEC,   AVERAGES
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   6601, MSUTIC , STAT, Requests to/from CLIENT                                ,          1,     .07,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, user calls                                                ,          1,        .07,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, pinned cursors current                                 ,         -1,    -.07,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, session logical reads                                  ,     149590,    9.9k,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, CPU used when call started                             ,        227,   15.02,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, CPU used by this session                               ,        227,   15.02,      ,             ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, DB time                                                   ,       1047,   69.29,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, user I/O wait time                                     ,        424,   28.06,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, non-idle wait time                                     ,        424,   28.06,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, non-idle wait count                                    ,       3216,  212.84,      ,          ,       ,           ,       ~ per execution
   6601, MSUTIC , STAT, session uga memory                                     ,     135248,   8.95k,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, physical read total IO requests                        ,       9354,  619.07,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, physical read total multi block requests               ,       9333,  617.68,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, physical read total bytes                              , 1225228288,  81.09M,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, cell physical IO interconnect bytes                    , 1225228288,  81.09M,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, consistent gets                                        ,     149578,    9.9k,      ,             ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, consistent gets from cache                             ,          5,     .33,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, consistent gets from cache (fastpath)                  ,          5,     .33,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, consistent gets direct                                 ,     149572,    9.9k,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, logical read bytes from cache                          ,      40960,   2.71k,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, physical reads                                            ,     149548,    9.9k,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, physical reads direct                                  ,     149548,    9.9k,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, physical read IO requests                              ,       9353,  619.01,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, physical read bytes                                    , 1225097216,  81.08M,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, calls to kcmgcs                                           ,          5,     .33,      ,          ,          ,        ,       ~ per execution
   6601, MSUTIC , STAT, file io wait time                                      ,        304,   20.12,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, total number of slots                                  ,         -2,    -.13,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, Effective IO time                                      ,    4239980, 280.61k,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, Number of read IOs issued                              ,       9354,  619.07,      ,          ,       ,        ,          ~ per execution
   6601, MSUTIC , STAT, no work - consistent read gets                         ,     149564,    9.9k,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, Cached Commit SCN referenced                           ,     149132,   9.87k,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, table scans (cache partitions)                         ,          3,      .2,         ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, table scans (direct read)                              ,          3,      .2,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, table scan rows gotten                                 ,    3518684, 232.88k,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, table scan blocks gotten                               ,     149559,    9.9k,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, bytes sent via SQL*Net to CLIENT                       ,        211,   13.96,      ,          ,       ,           ,   105.5 bytes per roundtrip
   6601, MSUTIC , STAT, bytes received via SQL*Net from CLIENT                 ,          8,     .53,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , STAT, SQL*Net roundtrips to/from CLIENT                      ,          2,     .13,      ,          ,       ,        ,       ~ per execution
   6601, MSUTIC , TIME, DB CPU                                                    ,    2000964,   132.43ms, 13.2%, [@@     ],       ,        ,
   6601, MSUTIC , TIME, sql execute elapsed time                               ,    8500210,   562.57ms,    56.3%, [###### ],       ,        ,
   6601, MSUTIC , TIME, DB time                                                   ,    8500269,   562.57ms,    56.3%, [###### ],       ,        ,     14.62s unaccounted time
   6601, MSUTIC , WAIT, direct path read                                          ,    4059380,   268.66ms, 26.9%, [WWW    ],      3064,  202.78,  1.32ms average wait
   6601, MSUTIC , WAIT, SQL*Net message to CLIENT                              ,          4,   .26us,   .0%, [       ],      1,     .07,        4us average wait
   6601, MSUTIC , WAIT, SQL*Net message from CLIENT                            ,    8006127,   529.87ms,    53.0%, [WWWWWW ],      1,     .07,   8.01s average wait
 
--  End of Stats snap 1, end=2016-02-24 08:23:59, seconds=15.1
 
 
----------------------------------------------------------------------------------------------------
Active% | INST | SQL_ID       | SQL_CHILD | EVENT                               | WAIT_CLASS
----------------------------------------------------------------------------------------------------
 29% | 1 | gg54c4j6b9jb0   | 0         | direct path read                    | User I/O
 21% | 1 | gg54c4j6b9jb0   | 0         | ON CPU                           | ON CPU
 
--  End of ASH snap 1, end=2016-02-24 08:23:59, seconds=15, samples_taken=96


Reorganizing table solved my problem. Finally full scans on the table were running much faster.

There is interesting support note "Doc ID 238519.1" which states that trailing NULLs do not take space in the rowpiece: initially row fits in one rowpiece.
If column beyond 255 is then populated, then all the NULL columns between last populated and this new column now takes up space.
Row has to be split into two rowpieces and the new rowpiece is migrated to a new block - row becomes chained.

In our table we have trailing NULL columns so this probably caused such migration.


Unfortunately I don’t have time to perform detailed investigation.




REFERENCES
http://blog.tanelpoder.com/2009/11/04/detect-chained-and-migrated-rows-in-oracle/
Updating a Row with More Than 255 Columns Causes Row Chaining (Doc ID 238519.1)

Read More...

Detecting Soft Corruption in 12c - V$NONLOGGED_BLOCK, ORA-01578/ORA-26040

Saturday, February 20, 2016 0 comments
Last week we have created standby database in our dev environment and performed some ETL actions on primary side. Loading data or rebuilding indexes was performed with NOLOGGING option. After few days we noticed lots ORA-01578/ORA-26040 errors.
Corruption happened because we forgot to enable force logging.

As this was new dev database there wasn’t backup, but maybe not everything was lost. If only corrupted segments are indexes we could easily rebuild them.

Then I’ve learnt something new.
After performing validation check logical, we noticed lots corrupted blocks, but I was puzzled why do I have “v$database_block_corruption” view empty. Then my colleague told me that Oracle changed behaviour in reporting soft corrupted blocks in 12c version (we were using 12.1.0.2). New view was updated - V$NONLOGGED_BLOCK.

So I have created little demo case on how to detect (and repair) soft corrupted blocks on 12c database.



Create tablespace and small table.
SQL> create tablespace DEMO1 datafile '/oradata1/data/ora12c/demo01.dbf' size 50M;
Tablespace created.

SQL> create table objects tablespace DEMO as select * from dba_objects;
Table created.

SQL> alter table objects add constraint pk_obj primary key (object_id);
Table altered.

SQL> create index idx_obj_name on objects(object_name) tablespace demo1;
Index created.

Backup tablespace.
RMAN> backup tablespace DEMO1;

Starting backup at 23-AUG-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=50 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00002 name=/oradata1/data/ora12c/demo01.dbf
channel ORA_DISK_1: starting piece 1 at 23-AUG-15
channel ORA_DISK_1: finished piece 1 at 23-AUG-15
piece handle=/oradata1/fra/ORA12C/backupset/2015_08_23/o1_mf_nnndf_TAG20150823T060639_bxlkpj3j_.bkp tag=TAG20150823T060639 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 23-AUG-15

Starting Control File and SPFILE Autobackup at 23-AUG-15
piece handle=/oradata1/fra/ORA12C/autobackup/2015_08_23/o1_mf_s_888473201_bxlkpktg_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 23-AUG-15

Rebuild index with NOLOGGING option to simulate soft corruption later.
RMAN> alter index idx_obj_name rebuild nologging;
Statement processed

Confirm that we have datafiles that require backup because they have been affected with NOLOGGING operation.
RMAN> report unrecoverable;

Report of files that need backup due to unrecoverable operations
File Type of Backup Required Name
---- ----------------------- -----------------------------------
2    full or incremental     /oradata1/data/ora12c/demo01.dbf
5    full or incremental     /oradata1/data/ora12c/example01.dbf

Simulate corruption.
RMAN> alter database datafile 2 offline;
Statement processed

RMAN> restore datafile 2;

Starting restore at 23-AUG-15
using channel ORA_DISK_1

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00002 to /oradata1/data/ora12c/demo01.dbf
channel ORA_DISK_1: reading from backup piece /oradata1/fra/ORA12C/backupset/2015_08_23/o1_mf_nnndf_TAG20150823T060639_bxlkpj3j_.bkp
channel ORA_DISK_1: piece handle=/oradata1/fra/ORA12C/backupset/2015_08_23/o1_mf_nnndf_TAG20150823T060639_bxlkpj3j_.bkp tag=TAG20150823T060639
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03
Finished restore at 23-AUG-15

RMAN> recover datafile 2;

Starting recover at 23-AUG-15
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:01

Finished recover at 23-AUG-15

RMAN> alter database datafile 2 online;
Statement processed

Query table with corrupted index and notice error.
SQL> select count(*) from objects where object_name like 'A%';
select count(*) from objects where object_name like 'A%'
       *
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 2, block # 2617)
ORA-01110: data file 2: '/oradata1/data/ora12c/demo01.dbf'
ORA-26040: Data block was loaded using the NOLOGGING option

Let’s perform validation of datafile to check block corruption.
RMAN> backup validate check logical datafile 2;

Starting backup at 23-AUG-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=40 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00002 name=/oradata1/data/ora12c/demo01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
2    OK     460            129          6401            1776280
  File Name: /oradata1/data/ora12c/demo01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              1537
  Index      0              462
  Other      0              4272

Finished backup at 23-AUG-15

Notice that we have 460 blocks marked corrupt but v$database_block_corruption view is empty.
SQL> select count(*) from v$database_block_corruption;

  COUNT(*)
----------
  0

Let’s query v$nonlogged_block view.
SQL> set lines 200
SQL> set pages 999
SQL> select file#, block#, blocks,object#,reason from v$nonlogged_block;

     FILE#     BLOCK#   BLOCKS OBJECT#      REASON
---------- ---------- ---------- ---------------------------------------- -------
  2  2308       12       UNKNOWN
  2  2321       15       UNKNOWN
  2  2337       15       UNKNOWN
  2  2353       15       UNKNOWN
  2  2369       15       UNKNOWN
  2  2385       15       UNKNOWN
  2  2401       15       UNKNOWN
  2  2417       15       UNKNOWN
  2  2434      126       UNKNOWN
  2  2562      126       UNKNOWN
  2  2690       91       UNKNOWN

11 rows selected.


Will RMAN detect that we have corrupted blocks?
RMAN> backup datafile 2;

Starting backup at 23-AUG-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=54 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00002 name=/oradata1/data/ora12c/demo01.dbf
channel ORA_DISK_1: starting piece 1 at 23-AUG-15
channel ORA_DISK_1: finished piece 1 at 23-AUG-15
piece handle=/oradata1/fra/ORA12C/backupset/2015_08_23/o1_mf_nnndf_TAG20150823T061602_bxll8275_.bkp tag=TAG20150823T061602 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 23-AUG-15
RMAN backup won’t fail due to NOLOGGING corrupt blocks and our backup will contain soft corrupted blocks.

Let’s Identify corrupt segments using v$nonlogged_block view.
set lines 2000
set pages 9999
col owner for a20
col partition_name for a10
col segment_name for a20

SELECT e.owner, e.segment_type, e.segment_name, e.partition_name, c.file#
     , greatest(e.block_id, c.block#) corr_start_block#
     , least(e.block_id+e.blocks-1, c.block#+c.blocks-1) corr_end_block#
     , least(e.block_id+e.blocks-1, c.block#+c.blocks-1)
       - greatest(e.block_id, c.block#) + 1 blocks_corrupted
  FROM dba_extents e, V$NONLOGGED_BLOCK c
 WHERE e.file_id = c.file#
   AND e.block_id <= c.block# + c.blocks - 1
   AND e.block_id + e.blocks - 1 >= c.block#
UNION
SELECT s.owner, s.segment_type, s.segment_name, s.partition_name, c.file#
     , header_block corr_start_block#
     , header_block corr_end_block#
     , 1 blocks_corrupted
  FROM dba_segments s, V$NONLOGGED_BLOCK c
 WHERE s.header_file = c.file#
   AND s.header_block between c.block# and c.block# + c.blocks - 1
UNION
SELECT null owner, null segment_type, null segment_name, null partition_name, c.file#
     , greatest(f.block_id, c.block#) corr_start_block#
     , least(f.block_id+f.blocks-1, c.block#+c.blocks-1) corr_end_block#
     , least(f.block_id+f.blocks-1, c.block#+c.blocks-1)
       - greatest(f.block_id, c.block#) + 1 blocks_corrupted
  FROM dba_free_space f, V$NONLOGGED_BLOCK  c
 WHERE f.file_id = c.file#
   AND f.block_id <= c.block# + c.blocks - 1
   AND f.block_id + f.blocks - 1 >= c.block#
order by file#, corr_start_block#
/



OWNER       SEGMENT_TYPE SEGMENT_NAME      PARTITION_      FILE# CORR_START_BLOCK# CORR_END_BLOCK# BLOCKS_CORRUPTED
-------------------- ------------------ -------------------- ---------- ---------- ----------------- --------------- ----------------
SYS       INDEX  IDX_OBJ_NAME     2  2308  2311      4
SYS       INDEX  IDX_OBJ_NAME     2  2312  2319      8
SYS       INDEX  IDX_OBJ_NAME     2  2321  2327      7
SYS       INDEX  IDX_OBJ_NAME     2  2328  2335      8
SYS       INDEX  IDX_OBJ_NAME     2  2337  2343      7
SYS       INDEX  IDX_OBJ_NAME     2  2344  2351      8
SYS       INDEX  IDX_OBJ_NAME     2  2353  2359      7
SYS       INDEX  IDX_OBJ_NAME     2  2360  2367      8
SYS       INDEX  IDX_OBJ_NAME     2  2369  2375      7
SYS       INDEX  IDX_OBJ_NAME     2  2376  2383      8
SYS       INDEX  IDX_OBJ_NAME     2  2385  2391      7
SYS       INDEX  IDX_OBJ_NAME     2  2392  2399      8
SYS       INDEX  IDX_OBJ_NAME     2  2401  2407      7
SYS       INDEX  IDX_OBJ_NAME     2  2408  2415      8
SYS       INDEX  IDX_OBJ_NAME     2  2417  2423      7
SYS       INDEX  IDX_OBJ_NAME     2  2424  2431      8
SYS       INDEX  IDX_OBJ_NAME     2  2434  2559    126
SYS       INDEX  IDX_OBJ_NAME     2  2562  2687    126
SYS       INDEX  IDX_OBJ_NAME     2  2690  2780     91

19 rows selected.

This is the best outcome to get if you notice corruption errors. All errors are related to index corruption so we could fix this problem rebuilding index.

SQL> alter index idx_obj_name rebuild;
alter index idx_obj_name rebuild
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 2, block # 2308)
ORA-01110: data file 2: '/oradata1/data/ora12c/demo01.dbf'
ORA-26040: Data block was loaded using the NOLOGGING option

Simply issuing "alter index rebuild" command won't work.
We should mark index unusable to drop segment before rebuilding it or just rebuild index with online option.

It is better choice to mark index unusable because you don't need additional space then, but I will simply rebuild index with online option and see what will happen.
SQL> alter index idx_obj_name rebuild online;
Index altered.

SQL> select count(*) from objects where object_name like 'A%';

  COUNT(*)
----------
      2079

No errors... but, let's validate datafile for corruption.
RMAN> backup validate check logical datafile 2;

Starting backup at 23-AUG-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=40 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00002 name=/oradata1/data/ora12c/demo01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
2    OK     460            94           6402            1779294
  File Name: /oradata1/data/ora12c/demo01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              1537
  Index      0              587
  Other      0              4182

Finished backup at 23-AUG-15
Notice "Marked Corrupt" column. Hm... 460 like before.

Don't worry, this is not new corruption. These are FREE blocks which will be reused and Oracle will automatically re-format those blocks.
set lines 2000
set pages 9999
col owner for a20
col partition_name for a10
col segment_name for a20

SELECT e.owner, e.segment_type, e.segment_name, e.partition_name, c.file#
     , greatest(e.block_id, c.block#) corr_start_block#
     , least(e.block_id+e.blocks-1, c.block#+c.blocks-1) corr_end_block#
     , least(e.block_id+e.blocks-1, c.block#+c.blocks-1)
       - greatest(e.block_id, c.block#) + 1 blocks_corrupted
  FROM dba_extents e, V$NONLOGGED_BLOCK c
 WHERE e.file_id = c.file#
   AND e.block_id <= c.block# + c.blocks - 1
   AND e.block_id + e.blocks - 1 >= c.block#
UNION
SELECT s.owner, s.segment_type, s.segment_name, s.partition_name, c.file#
     , header_block corr_start_block#
     , header_block corr_end_block#
     , 1 blocks_corrupted
  FROM dba_segments s, V$NONLOGGED_BLOCK c
 WHERE s.header_file = c.file#
   AND s.header_block between c.block# and c.block# + c.blocks - 1
UNION
SELECT null owner, null segment_type, null segment_name, null partition_name, c.file#
     , greatest(f.block_id, c.block#) corr_start_block#
     , least(f.block_id+f.blocks-1, c.block#+c.blocks-1) corr_end_block#
     , least(f.block_id+f.blocks-1, c.block#+c.blocks-1)
       - greatest(f.block_id, c.block#) + 1 blocks_corrupted
  FROM dba_free_space f, V$NONLOGGED_BLOCK  c
 WHERE f.file_id = c.file#
   AND f.block_id <= c.block# + c.blocks - 1
   AND f.block_id + f.blocks - 1 >= c.block#
order by file#, corr_start_block#
/


OWNER       SEGMENT_TYPE SEGMENT_NAME      PARTITION_      FILE# CORR_START_BLOCK# CORR_END_BLOCK# BLOCKS_CORRUPTED
-------------------- ------------------ -------------------- ---------- ---------- ----------------- --------------- ----------------
           2  2308  2319     12
           2  2321  2335     15
           2  2337  2351     15
           2  2353  2367     15
           2  2369  2383     15
           2  2385  2399     15
           2  2401  2415     15
           2  2417  2431     15
           2  2434  2559    126
           2  2562  2687    126
           2  2690  2780     91

11 rows selected.

We could force re-formatting creating dummy table and inserting data to dummy table.
Check Doc ID 336133.1.
create table s (
       n number,
       c varchar2(4000)
     ) nologging tablespace DEMO1;


SQL> BEGIN
FOR i IN 1..1000000 LOOP
INSERT /*+ APPEND */ INTO sys.s select i, lpad('REFORMAT',3092, 'R') from dual;
commit ;
END LOOP;
END;
/  2    3    4    5    6    7


BEGIN
*
ERROR at line 1:
ORA-01653: unable to extend table SYS.S by 128 in tablespace DEMO1
ORA-06512: at line 3


SQL> drop table sys.s purge;
Table dropped.

Notice that we don't have corrupted blocks any more.
RMAN> backup validate check logical datafile 2;

Starting backup at 23-AUG-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=67 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00002 name=/oradata1/data/ora12c/demo01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
2    OK     0              3929         14593           1818933
  File Name: /oradata1/data/ora12c/demo01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              9851
  Index      0              461
  Other      0              351

Finished backup at 23-AUG-15



Recovering corrupted index is easy, but recovering data blocks could be slightly difficult or sometimes impossible.
Perform validation and backups regularly because corruption will hit you when you least expect ;)



Read More...

Unindexed Foreign Keys on empty/unused table and locks

Thursday, December 17, 2015 0 comments
It is widely known that unindexed foreign keys can be performance issue. Unindexed foreign keys on child tables can cause table locks or performance problems in general.
There are many articles on this subject so I won't go in details.

My plan is to show simple demo case where empty child table with unindexed foreign key column can cause big problems.


Imagine that you have highly active table (supplier) with lots DML operations from many sessions.
In the meantime someone created new child table (product) in relationship with parent table (supplier). This table is empty and unused so why should you bother about indexing foreign key columns on empty table.

I will show you case where this empty table can cause lock contention and serious performance issues.

Oracle version - 11.2.0.4.0

CREATE TABLE supplier 
( id number(10) not null,
  supplier_id number(10) not null, 
  supplier_name varchar2(50) not null, 
  contact_name varchar2(50), 
CONSTRAINT id_pk PRIMARY KEY (id),
CONSTRAINT supplier_uk UNIQUE(supplier_id)
); 


INSERT INTO supplier VALUES (1,100, 'Supplier 1', 'Contact 1'); 
INSERT INTO supplier VALUES (2,200, 'Supplier 2', 'Contact 2'); 

COMMIT; 



CREATE TABLE product 
( product_id number(10) not null, 
product_name varchar2(50) not null, 
supplier_id number(10) not null, 
CONSTRAINT fk_supplier 
FOREIGN KEY (supplier_id) 
REFERENCES supplier(supplier_id)
 ); 
  
 
 
SQL> select id, supplier_id, supplier_name, contact_name from supplier;

        ID SUPPLIER_ID SUPPLIER_NAME                                      CONTACT_NAME
---------- ----------- -------------------------------------------------- ------------
         1         100 Supplier 1                                         Contact 1
         2         200 Supplier 2                                         Contact 2
    
    
-- Product table is empty and unused
SQL> select product_id, product_name, supplier_id from product;

no rows selected

User from SESSION1 inserts row and waits some time to end transaction.
--SESSION 1:
INSERT INTO supplier VALUES (3,300, 'Supplier 3', 'Contact 3'); --(Without COMMIT)
1 row created.

In the same time there are lots sessions which are trying to update record with column used in foreign-key relationship. All sessions are hanging and you have big problem.
--SESSION 2:
UPDATE supplier SET supplier_id=200 WHERE supplier_id = 200;  --(HANG)

Let's try another INSERT in next session:
--SESSION 3:
INSERT INTO supplier VALUES (4,400, 'Supplier 4', 'Contact 4');  --(HANG)

Now we have inserts hanging which could lead to major problems for very active table.

Check locks:

SELECT l.sid, s.blocking_session blocker, s.event, l.type, l.lmode, 
       l.request, o.object_name, o.object_type 
FROM v$lock l, dba_objects o, v$session s 
WHERE UPPER(s.username) = UPPER('MSUTIC') 
AND l.id1 = o.object_id (+) 
AND l.sid = s.sid 
ORDER BY sid, type;

       SID    BLOCKER EVENT                                 TY      LMODE    REQUEST OBJECT_NAME                 OBJECT_TYPE
---------- ---------- -------------------------------------- -- ---------- ---------- -------------------------- ------------
        63       1641 enq: TM - contention                   AE          4          0 ORA$BASE                    EDITION
        63       1641 enq: TM - contention                   TM          3          0 SUPPLIER                    TABLE
        63       1641 enq: TM - contention                   TM          0          4 PRODUCT                     TABLE
      1390            SQL*Net message to client              AE          4          0 ORA$BASE                    EDITION
      1641            SQL*Net message from client            AE          4          0 ORA$BASE                    EDITION
      1641            SQL*Net message from client            TM          3          0 SUPPLIER                    TABLE
      1641            SQL*Net message from client            TM          3          0 PRODUCT                     TABLE
      1641            SQL*Net message from client            TX          6          0 TPT                         SYNONYM
      2159            SQL*Net message from client            AE          4          0 ORA$BASE                    EDITION
      2729         63 enq: TM - contention                   AE          4          0 ORA$BASE                    EDITION
      2729         63 enq: TM - contention                   TM          0          3 PRODUCT                     TABLE
      2729         63 enq: TM - contention                   TM          3          0 SUPPLIER                    TABLE


Unused and empty product table is culprit for performance issues.


Create index on foreign key column and check behaviour.

CREATE INDEX fk_supplier ON product (supplier_id);

--SESSION 1:
INSERT INTO supplier VALUES (3,300, 'Supplier 3', 'Contact 3');
1 row created.


--SESSION 2:
UPDATE supplier SET supplier_id=200 WHERE supplier_id = 200;
1 row updated.

Now everything worked without locking problems.



Notice that we have different behaviour in 12c version.

Oracle version - 12.1.0.2.0

CREATE TABLE supplier 
( supplier_id number(10) not null, 
  supplier_name varchar2(50) not null, 
  contact_name varchar2(50), 
CONSTRAINT supplier_pk PRIMARY KEY (supplier_id) 
); 

INSERT INTO supplier VALUES (1, 'Supplier 1', 'Contact 1'); 
INSERT INTO supplier VALUES (2, 'Supplier 2', 'Contact 2'); 
COMMIT; 

CREATE TABLE product 
( product_id number(10) not null, 
product_name varchar2(50) not null, 
supplier_id number(10) not null, 
CONSTRAINT fk_supplier 
FOREIGN KEY (supplier_id) 
REFERENCES supplier(supplier_id)
); 
  

--SESSION 1:
INSERT INTO supplier VALUES (3, 'Supplier 3', 'Contact 3'); --(Without COMMIT)
1 row created.

--SESSION 2:
UPDATE supplier SET supplier_id=2 WHERE supplier_id = 2; -- (No HANG)
1 row updated.

Check locks:

SELECT l.sid, s.blocking_session blocker, s.event, l.type, l.lmode,
       l.request, o.object_name, o.object_type 
FROM v$lock l, dba_objects o, v$session s 
WHERE UPPER(s.username) = UPPER('MSUTIC') 
AND l.id1 = o.object_id (+) 
AND l.sid = s.sid 
ORDER BY sid, type;


  SID    BLOCKER EVENT                          TY      LMODE    REQUEST OBJECT_NAME
------ ---------- ------------------------------ -- ---------- ---------- ------------
 4500            SQL*Net message from client    AE          4          0 ORA$BASE
 4500            SQL*Net message from client    TM          3          0 SUPPLIER
 4500            SQL*Net message from client    TX          6          0
 6139            SQL*Net message to client      AE          4          0 ORA$BASE
 6144            SQL*Net message from client    AE          4          0 ORA$BASE
 6144            SQL*Net message from client    TM          3          0 SUPPLIER
 6144            SQL*Net message from client    TM          2          0 PRODUCT
 6144            SQL*Net message from client    TX          6          0



I don't think that you should index all foreign keys all the time. Sometimes this is not needed and it could be overhead. Unnecessary indexes on foreign keys are wasting storage space and cause slower DML operations on the table.

Think about application and how parent/child tables will be used before creating indexes and check articles from Tom Kyte on this subject.



Read More...

Confusion and problems with lost+found directory in MySQL/Galera cluster configuration

Wednesday, October 7, 2015 0 comments
The lost+found directory is filesystem directory created at root level of mounted drive for ext file systems. It is used by file system check tools (fsck) for file recoveries.

In MySql world it can cause confusion or possible problems with synchronisation in Galera cluster configuration.

Let’s check some examples.

I have MySQL database with datadir=/data in configuration file. I have deleted lost+found directory and restarted MySQL service.

When I list my databases this is result:
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| employees          |
| mysql              |
| performance_schema |
| pitrdb             |
| sbtest             |
| sys                |
| test               |
+--------------------+
8 rows in set (0.34 sec)

I will stop MySQL service and recreate lost+found directory.
$ sudo service mysql stop

$ cd /data
$ sudo mklost+found
mklost+found 1.42.9 (4-Feb-2014)

Restart service and show databases.
$ sudo service mysql start

mysql> show databases;
+---------------------+
| Database            |
+---------------------+
| information_schema  |
| employees           |
| #mysql50#lost+found |
| mysql               |
| performance_schema  |
| pitrdb              |
| sbtest              |
| sys                 |
| test                |
+---------------------+
9 rows in set (0.01 sec)

Notice database : #mysql50#lost+found

If you have dedicated entire FS to use as MySQL datadir then MySQL will interpret all files under that directory as db-related files.
SHOW DATABASES lists database lost+found which is not real database.

If you check error log you can notice this message:
[ERROR] Invalid (old?) table or database name 'lost+found'

For a single server configuration issues with lost+found directory can only make confusion. I’m not aware of any negative effects for database.
To avoid confusion you should move database to sub-directory below the root level directory. Also remove all directories that are not MySql db-related from datadir location.


Stop MySQL service on database server.
$ sudo service mysql stop

Make sub-directory and move existing data to new directory.
$ sudo su -
root@galera1:~# cd /data
root@galera1:/data# mkdir mydata && mv !(mydata) mydata
root@galera1:/data# chown -R mysql:mysql /data

Update configuration file with new datadir location.
# vi /etc/mysql/my.cnf
...
datadir=/data/mydata
...

Remove non-database directories.
# rm -rf mydata/lost+found
# mklost+found
mklost+found 1.42.9 (4-Feb-2014)

# pwd
/data
# ls -l
total 56
drwx------ 2 root  root  49152 Oct  4 16:48 lost+found
drwxr-xr-x 9 mysql mysql  4096 Oct  4 16:48 mydata

Restart the service.
$ sudo service mysql start


From 5.6 version you can tell server to ignore non-database directories using ignore-db-dir option.
$ sudo vi /etc/mysql/my.cnf
...
ignore-db-dir=lost+found
...



Let’s test how lost+found directory affects Galera cluster configuration.
For this test I’m using Percona XtraDB Cluster 5.6 with 3 nodes.

# dpkg -l | grep percona-xtradb-cluster-server
ii  percona-xtradb-cluster-server-5.6         5.6.25-25.12-1.trusty                               amd64        Percona XtraDB Cluster database server binaries


mysql> select version();
+--------------------+
| version()          |
+--------------------+
| 5.6.25-73.1-56-log |
+--------------------+
1 row in set (0.00 sec)


mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
1 row in set (0.01 sec)

In this configuration for datadir is specified /data location with lost+found directory.
As this is 5.6 version I’ve included ignore-db-dir option in configuration file.

In SHOW DATABASES list and error log I don’t see any issues.
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| employees          |
| mysql              |
| performance_schema |
| pitrdb             |
| sbtest             |
| sys                |
| test               |
+--------------------+
8 rows in set (0.00 sec)

For SST method I’m using default and recommended Percona’s xtrabackup-v2.
So, what will happen if I initiate SST for one of the nodes in the cluster.

$ sudo service mysql stop
 * Stopping MySQL (Percona XtraDB Cluster) mysqld  [OK]

$ sudo rm /data/grastate.dat

$ sudo service mysql start
[sudo] password for marko:
 * Starting MySQL (Percona XtraDB Cluster) database server mysqld
* State transfer in progress, setting sleep higher mysqld
* The server quit without updating PID file (/data/galera2.pid).

It appears that SST failed with errors:

WSREP_SST: [ERROR] Cleanup after exit with status:1 (20151004 12:01:00.936)
2015-10-04 12:01:02 16136 [Note] WSREP: (cf98f684, 'tcp://0.0.0.0:4567') turning message relay requesting off
2015-10-04 12:01:12 16136 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.56.102' --datadir '/data/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '16136' --binlog 'percona-bin' : 1 (Operation not permitted)
2015-10-04 12:01:12 16136 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2015-10-04 12:01:12 16136 [ERROR] WSREP: SST script aborted with error 1 (Operation not permitted)
2015-10-04 12:01:12 16136 [ERROR] WSREP: SST failed: 1 (Operation not permitted)
2015-10-04 12:01:12 16136 [ERROR] Aborting

2015-10-04 12:01:12 16136 [Warning] WSREP: 0.0 (galera3): State transfer to 1.0 (galera2) failed: -22 (Invalid argument)
2015-10-04 12:01:12 16136 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():731: Will never receive state. Need to abort.


The cause of SST failure is lost+found directory but in error log lost+found directory is not mentioned.

SST fails because xtrabackup ignores ignore-db-dir option and tries to synchronise lost+found directory which is owned by root user.


What will happen if I (for test) change the ownership of lost+found directory on donor nodes.

drwx------ 2 root  root      49152 Oct  4 11:50 lost+found

marko@galera3:/data# sudo chown -R mysql:mysql /data/lost+found
marko@galera1:/data$ sudo chown -R mysql:mysql /data/lost+found


marko@galera2:/data$ sudo service mysql start
 * Stale sst_in_progress file in datadir mysqld
* Starting MySQL (Percona XtraDB Cluster) database server mysqld
* State transfer in progress, setting sleep higher mysqld           [OK]

NODE2
...
drwxrwx--x  2 mysql mysql      4096 Oct  4 12:07 lost+found
...

SST succeeded and node is successfully joined/synced to the cluster.


To avoid this inconveniences just move databases from root directory.
Some of you will simply delete lost+found directory, but be aware, fsck may recreate lost+found directory and your cluster synchronisation will fail when you least expect it ;)


Read More...

How to Pass Arguments to OS Shell Script from Oracle Database

Sunday, May 10, 2015 3 comments
Imagine you have several Oracle databases on the same host under same OS user.

In scripts directory you have shell script that kills OS processes.
Idea is to call OS script from database procedure and kill problematic process using shell script.

Script will run simple query to get process id and kill that process.

But how to assure that this script will execute in correct environment for correct database?

One way is to create one script per database and set environment inside the script, or create just one script which will dynamically set correct environment for instance that is calling script.

For demo case I’ve created simple script that spools query output to the file.

#!/bin/bash

# Avoid oraenv asking
ORAENV_ASK="NO"; export ORAENV_ASK

ORACLE_SID=$1; export ORACLE_SID

. oraenv ${ORACLE_SID}

$ORACLE_HOME/bin/sqlplus -s "/ as sysdba" <<EOF > /tmp/my_environment.txt
set heading off feedback off verify off
col instance_name for a10
col host_name for a10
col status for a10
select instance_name, host_name, status
 from v\$instance;
exit
EOF

$ chmod u+x simple_script.sh


What happens when we execute script.

$ ./simple_script.sh testdb
The Oracle base for ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1 is /u01/app/oracle
$
$ cat /tmp/my_environment.txt

testdb     asterix    OPEN

$ ./simple_script.sh ora11gr2
The Oracle base for ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1 is /u01/app/oracle
$
$ cat /tmp/my_environment.txt

ora11gr2   asterix    OPEN

Notice how I specified ORACLE_SID using command line argument. Script sets environment from ORATAB file according to specified SID and spools output to my_enviroment.txt file.

I will demonstrate how to pass argument from database layer.


To execute external job I have to create credentials on both databases.

-- Session 1

system@ORA11GR2> begin
  2      dbms_scheduler.create_credential(
  3      credential_name => 'ORACLE_CRED',
  4      username => 'oracle',
  5      password => 'password');
  6      end;
  7  /

PL/SQL procedure successfully completed.


-- Session 2

system@TESTDB> begin
  2      dbms_scheduler.create_credential(
  3      credential_name => 'ORACLE_CRED',
  4      username => 'oracle',
  5      password => 'password');
  6      end;
  7  /

PL/SQL procedure successfully completed.


Use SYS_CONTEXT function to get instance name and execute script for specified instance.

-- Session 1

system@ORA11GR2> DECLARE
  2    l_oracle_sid varchar2(20);
  3  BEGIN
  4    select sys_context('userenv','instance_name') into l_oracle_sid
  5    from dual;
  6        DBMS_SCHEDULER.CREATE_JOB (
  7              job_name => 'J_SIMPLE_SCRIPT',
  8              job_type => 'EXECUTABLE',
  9              job_action => '/home/oracle/skripte/simple_script.sh',
 10              number_of_arguments => 1,
 11              start_date => NULL,
 12              repeat_interval => NULL,
 13              end_date => NULL,
 14              enabled => FALSE,
 15              auto_drop => TRUE,
 16              comments => 'Set environment and execute query on v$instance view');
 17           dbms_scheduler.set_attribute('J_SIMPLE_SCRIPT','credential_name','ORACLE_CRED');
 18           DBMS_SCHEDULER.set_job_argument_value('J_SIMPLE_SCRIPT',1,l_oracle_sid);
 19          DBMS_SCHEDULER.enable('J_SIMPLE_SCRIPT');
 20          DBMS_SCHEDULER.run_job (job_name=> 'J_SIMPLE_SCRIPT', use_current_session => FALSE);
 21  END;
 22  /

PL/SQL procedure successfully completed.


system@ORA11GR2> host cat /tmp/my_environment.txt

ora11gr2   asterix    OPEN


I’ve called script from "ora11gr2" database and OS script was executed for specified database. DBMS_SCHEDULER job was used for passing argument to external OS script and for script execution.

From another session.

-- Session 2

system@TESTDB> DECLARE
  2    l_oracle_sid varchar2(20);
  3  BEGIN
  4    select sys_context('userenv','instance_name') into l_oracle_sid
  5    from dual;
  6        DBMS_SCHEDULER.CREATE_JOB (
  7              job_name => 'J_SIMPLE_SCRIPT',
  8              job_type => 'EXECUTABLE',
  9              job_action => '/home/oracle/skripte/simple_script.sh',
 10              number_of_arguments => 1,
 11              start_date => NULL,
 12              repeat_interval => NULL,
 13              end_date => NULL,
 14              enabled => FALSE,
 15              auto_drop => TRUE,
 16              comments => 'Set environment and execute query on v$instance view');
 17           dbms_scheduler.set_attribute('J_SIMPLE_SCRIPT','credential_name','ORACLE_CRED');
 18           DBMS_SCHEDULER.set_job_argument_value('J_SIMPLE_SCRIPT',1,l_oracle_sid);
 19          DBMS_SCHEDULER.enable('J_SIMPLE_SCRIPT');
 20          DBMS_SCHEDULER.run_job (job_name=> 'J_SIMPLE_SCRIPT', use_current_session => FALSE);
 21  END;
 22  /

PL/SQL procedure successfully completed.


SQL> host cat /tmp/my_environment.txt

testdb     asterix    OPEN

Notice how "/tmp/my_environment.txt" file changed according to specified database.


Using this method you can easily reuse OS scripts for more databases.


Read More...

ASM not starting with ORA-00845 - how to fix ASM parameter file

Saturday, May 9, 2015 2 comments
Few days ago I saw great post from Norman Dunbar on how to fix a broken ASM spfile.

With version 11gR2 ASM spfile can be stored in ASM diskgroup and by default Oracle Installer will put it there. So if you want to create pfile from spfile your ASM instance should be up and running.

If you have incorrect parameter in ASM spfile which is blocking ASM to start than you have slight problem. You cannot easily create pfile from spfile, correct incorrect parameter in pfile and recreate spfile, as you would do for database.

But don't worry, there are several options well explained available on net. I would recommend to practice all scenarios in you test environment if you want to avoid big stress in production later.


When I had problems with broken ASM parameter file (mostly in test/dev environment), I would always end up searching my notes or blog posts on how to solve this problem.

I knew that parameters were written directly in ASM disk header and I could extract them from there, or maybe check parameters in ASM alert log, but in back of my brain I was always thinking that there must be simpler way.

Thanks to Norman now I know how to quickly change incorrect parameter and keep other parameters intact.


I have used this trick few days ago and it worked perfectly. This blog post is just reminder which I know it will be useful for me in the future.



In my environment I have Oracle Restart with Oracle Database 12.1.0.2.0.

After starting my test server I have noticed that something is wrong because ASM was unable to start.
$ ./srvctl status asm
ASM is not running.

When I tried to start ASM manually I have received error:
$ ./srvctl start asm
PRCR-1079 : Failed to start resource ora.asm
CRS-5017: The resource action "ora.asm start" encountered the following error:
ORA-00845: MEMORY_TARGET not supported on this system
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/obelix/crs/trace/ohasd_oraagent_grid.trc".

CRS-2674: Start of 'ora.asm' on 'obelix' failed


Let's check alert log.
alert+ASM.log
Fri May 01 19:40:16 2015
MEMORY_TARGET defaulting to 1128267776.
* instance_number obtained from CSS = 1, checking for the existence of node 0...
* node 0 does not exist. instance_number = 1
Starting ORACLE instance (normal) (OS id: 4136)
Fri May 01 19:40:16 2015
CLI notifier numLatches:3 maxDescs:222
Fri May 01 19:40:16 2015
WARNING: You are trying to use the MEMORY_TARGET feature. This feature requires the /dev/shm file system to be mounted for at least 1140850688 bytes. /dev/shm is either not mounted or is mounted with available space less than this size. Please fix this so that MEMORY_TARGET can work as expected. Current available is 1051975680 and used is 208896 bytes. Ensure that the mount point is /dev/shm for this directory.


So there is problem with MEMORY_TARGET parameter, but how can I disable AMM when my ASM instance is down.

First I had to find location of ASM parameter file. I don’t have GPnP profile as this is single instance setup so I have extracted ASM parameter file location from "ora.asm" resource information.
$ crsctl stat res ora.asm -p | egrep "ASM_DISKSTRING|SPFILE"
ASM_DISKSTRING= SPFILE=+DATA/ASM/ASMPARAMETERFILE/registry.253.822856169


Create new parameter file with corrected MEMORY_TARGET parameter.
$ vi /tmp/initASM.ora
spfile="+DATA/asm/asmparameterfile/registry.253.862145335"
MEMORY_TARGET=0


Start ASM instance using new parameter file.

$ sqlplus / as sysasm

SQL*Plus: Release 12.1.0.2.0 Production on Fri May 1 20:04:39 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup pfile=/tmp/initASM.ora
ASM instance started

Total System Global Area  197132288 bytes
Fixed Size                  2922520 bytes
Variable Size             169043944 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted

And woila!
New parameter was applied and I was able to start ASM instance.


Change parameter in ASM spfile.
SQL> alter system set memory_target=0 scope=spfile;

System altered.

Restart ASM.
SQL> shutdown immediate;
ASM diskgroups dismounted
ASM instance shutdown

[grid@obelix bin]$ ./srvctl start asm
[grid@obelix bin]$ ./srvctl status asm
ASM is running on obelix


ASM instance successfully started with corrected parameter file.



Read More...

Restore to Restore Point on Standard Edition (no Flashback technology)

Saturday, February 28, 2015 0 comments
Restore points and Flashback database are nice features introduced in 10g database that provide efficient point in time recovery to reverse unwanted data changes.

But what if you have Standard Edition database:

SQL> shutdown immediate;

SQL> startup mount;

SQL> alter database flashback on;
alter database flashback on
*
ERROR at line 1:
ORA-00439: feature not enabled: Flashback Database

In Standard Edition you don’t have Flashback Database feature, but you can still create restore points and perform incomplete recoveries to restore point.


Create test table and insert status row.

SQL> create table admin.test_restore (datum date, komentar varchar2(100));
Table created.

SQL> insert into admin.test_restore values (sysdate, 'Before Restore Point');
1 row created.

SQL> commit;
Commit complete.


Create restore point here.

SQL> create restore point RP_UPGRADE;

Restore point created.


SQL> select scn, to_char(time,'dd.mm.yyyy hh24:mi:ss') time, name
  2  from v$restore_point;

       SCN TIME                NAME
---------- ------------------- ---------------------
    580752 27.02.2015 10:31:19 RP_UPGRADE

Notice how name of restore point is associated with SCN of the database.


Now you can perform potentially dangerous operations like database upgrades, table modifications, truncating data and like.

I will enter some status data for later checks.

SQL> insert into admin.test_restore values (sysdate, 'After Restore Point');
1 row created.

SQL> insert into admin.test_restore values (sysdate, 'Upgrade actions performed');
1 row created.

SQL> commit;
Commit complete.


Check table.

SQL> alter session set nls_date_format='dd.mm.yyyy hh24:mi:ss';
Session altered.

SQL> select datum, komentar from admin.test_restore order by datum;

DATUM               KOMENTAR
------------------- ------------------------------
27.02.2015 10:30:39 Before Restore Point
27.02.2015 10:31:45 After Restore Point
27.02.2015 10:31:55 Upgrade actions performed


Suppose we had some problems and want to "rewind" database to restore point. In EE we would perform flashback database to restore point but in SE we will use different approach.


Shutdown database and startup mount.

RMAN> shutdown immediate;

using target database control file instead of recovery catalog
database closed
database dismounted
Oracle instance shut down

RMAN> startup mount;

connected to target database (not started)
Oracle instance started
database mounted

Total System Global Area     471830528 bytes

Fixed Size                     2254344 bytes
Variable Size                247466488 bytes
Database Buffers             213909504 bytes
Redo Buffers                   8200192 bytes


Restore and recover database until restore point RP_UPGRADE.

RMAN> restore database until restore point RP_UPGRADE;

Starting restore at 27.02.2015 10:36:26
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=247 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00001 to +DATA1/ora11gr2/datafile/system.291.872722799
channel ORA_DISK_1: restoring datafile 00002 to +DATA1/ora11gr2/datafile/sysaux.292.872722847
channel ORA_DISK_1: restoring datafile 00003 to +DATA1/ora11gr2/datafile/undotbs1.278.872722879
channel ORA_DISK_1: restoring datafile 00004 to +DATA1/ora11gr2/datafile/users.296.872722925
channel ORA_DISK_1: reading from backup piece +FRA1/ora11gr2/backupset/2015_02_27/nnndf0_tag20150227t102559_0.1164.872763961
channel ORA_DISK_1: piece handle=+FRA1/ora11gr2/backupset/2015_02_27/nnndf0_tag20150227t102559_0.1164.872763961 tag=TAG20150227T102559
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:01:35
Finished restore at 27.02.2015 10:38:02

RMAN> recover database until restore point RP_UPGRADE;

Starting recover at 27.02.2015 10:38:45
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:01

Finished recover at 27.02.2015 10:38:49


Open database with resetlogs option.

RMAN> sql 'alter database open resetlogs';

sql statement: alter database open resetlogs


Final check.

SQL> alter session set nls_date_format='dd.mm.yyyy hh24:mi:ss';

Session altered.

SQL> select datum, komentar
  2  from admin.test_restore
  3  order by datum;

DATUM               KOMENTAR
------------------- --------------------------------------------------
27.02.2015 10:30:39 Before Restore Point


We "rewound" database to state that existed before RP_UPGRADE restore point is created.
This was incomplete recovery and RP_UPGRADE restore point was used just to mark location in time.



Read More...

12c Migrate Database from non-ASM to ASM using online relocation of data files

Wednesday, February 18, 2015 4 comments
There are many articles explaining how to migrate database from file system into ASM. You could use RMAN to create an image copy of the database into ASM and switch to the database copy, restore database from backup sets into ASM or create duplicate database.

All of these RMAN features are available on Oracle versions before 12c.

In this post I will use slightly different approach - using online relocation of data files into ASM.

Renaming and Relocating Online Data Files is available from 12c and enables us to rename or relocate data file while the database is open and users are accessing data files. This simplifies datafile management and avoids downtimes even when moving SYSTEM or UNDO tablespaces.


This is 12.1 pluggable single-tenant database I want to migrate from file system to ASM:

RMAN> report schema;

using target database control file instead of recovery catalog
Report of database schema for database with db_unique_name CDB12C

List of Permanent Datafiles
===========================
File Size(MB) Tablespace           RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1    800      SYSTEM               YES     /u01/app/oracle12/oradata/cdb12c/system01.dbf
2    260      PDB$SEED:SYSTEM      NO      /u01/app/oracle12/oradata/cdb12c/pdbseed/system01.dbf
3    740      SYSAUX               NO      /u01/app/oracle12/oradata/cdb12c/sysaux01.dbf
4    585      PDB$SEED:SYSAUX      NO      /u01/app/oracle12/oradata/cdb12c/pdbseed/sysaux01.dbf
5    740      UNDOTBS1             YES     /u01/app/oracle12/oradata/cdb12c/undotbs01.dbf
6    5        USERS                NO      /u01/app/oracle12/oradata/cdb12c/users01.dbf
7    270      PDB:SYSTEM           NO      /u01/app/oracle12/oradata/cdb12c/pdb/system01.dbf
8    605      PDB:SYSAUX           NO      /u01/app/oracle12/oradata/cdb12c/pdb/sysaux01.dbf
9    5        PDB:USERS            NO      /u01/app/oracle12/oradata/cdb12c/pdb/pdb_users01.dbf
10   50       PDB:MARKO            NO      /u01/app/oracle12/oradata/cdb12c/pdb/marko01.dbf

List of Temporary Files
=======================
File Size(MB) Tablespace           Maxsize(MB) Tempfile Name
---- -------- -------------------- ----------- --------------------
1    100      TEMP                 100         /u01/app/oracle12/oradata/cdb12c/temp01.dbf
2    62       PDB$SEED:TEMP        32767       /u01/app/oracle12/oradata/cdb12c/pdbseed/temp01.dbf
3    100      PDB:TEMP             100         /u01/app/oracle12/oradata/cdb12c/pdb/temp01.dbf



Create script to move data files:

$ sqlplus -s / as sysdba
set lines 200
set pages 50
set feed off
set head off
spool /tmp/move_dbfiles.sql
select 'ALTER DATABASE MOVE DATAFILE '''||name||''' TO ''+DATA'';' from v$datafile
order by con_id;

ALTER DATABASE MOVE DATAFILE '/u01/app/oracle12/oradata/cdb12c/system01.dbf' TO '+DATA';
ALTER DATABASE MOVE DATAFILE '/u01/app/oracle12/oradata/cdb12c/undotbs01.dbf' TO '+DATA';
ALTER DATABASE MOVE DATAFILE '/u01/app/oracle12/oradata/cdb12c/users01.dbf' TO '+DATA';
ALTER DATABASE MOVE DATAFILE '/u01/app/oracle12/oradata/cdb12c/sysaux01.dbf' TO '+DATA';
ALTER DATABASE MOVE DATAFILE '/u01/app/oracle12/oradata/cdb12c/pdbseed/sysaux01.dbf' TO '+DATA';
ALTER DATABASE MOVE DATAFILE '/u01/app/oracle12/oradata/cdb12c/pdbseed/system01.dbf' TO '+DATA';
ALTER DATABASE MOVE DATAFILE '/u01/app/oracle12/oradata/cdb12c/pdb/marko01.dbf' TO '+DATA';
ALTER DATABASE MOVE DATAFILE '/u01/app/oracle12/oradata/cdb12c/pdb/pdb_users01.dbf' TO '+DATA';
ALTER DATABASE MOVE DATAFILE '/u01/app/oracle12/oradata/cdb12c/pdb/system01.dbf' TO '+DATA';
ALTER DATABASE MOVE DATAFILE '/u01/app/oracle12/oradata/cdb12c/pdb/sysaux01.dbf' TO '+DATA';
spool end;
exit

Now, if you just run this script you will get some errors because you can’t move PDB files if you’re not in PDB container. Also PDB must not be offline.


After I slightly edited script this is final version:

$ cat /tmp/move_dbfiles.sql

ALTER DATABASE MOVE DATAFILE '/u01/app/oracle12/oradata/cdb12c/system01.dbf' TO '+DATA';
ALTER DATABASE MOVE DATAFILE '/u01/app/oracle12/oradata/cdb12c/undotbs01.dbf' TO '+DATA';
ALTER DATABASE MOVE DATAFILE '/u01/app/oracle12/oradata/cdb12c/users01.dbf' TO '+DATA';
ALTER DATABASE MOVE DATAFILE '/u01/app/oracle12/oradata/cdb12c/sysaux01.dbf' TO '+DATA';

ALTER SESSION SET CONTAINER=pdb$seed;

ALTER DATABASE MOVE DATAFILE '/u01/app/oracle12/oradata/cdb12c/pdbseed/sysaux01.dbf' TO '+DATA';
ALTER DATABASE MOVE DATAFILE '/u01/app/oracle12/oradata/cdb12c/pdbseed/system01.dbf' TO '+DATA';

ALTER SESSION SET CONTAINER=pdb;
ALTER DATABASE OPEN;

ALTER DATABASE MOVE DATAFILE '/u01/app/oracle12/oradata/cdb12c/pdb/marko01.dbf' TO '+DATA';
ALTER DATABASE MOVE DATAFILE '/u01/app/oracle12/oradata/cdb12c/pdb/pdb_users01.dbf' TO '+DATA';
ALTER DATABASE MOVE DATAFILE '/u01/app/oracle12/oradata/cdb12c/pdb/system01.dbf' TO '+DATA';
ALTER DATABASE MOVE DATAFILE '/u01/app/oracle12/oradata/cdb12c/pdb/sysaux01.dbf' TO '+DATA';


Let’s execute this script:

$ sqlplus / as sysdba

SQL> @/tmp/move_dbfiles.sql

Database altered.
Database altered.
Database altered.
Database altered.
Session altered.
Database altered.
Database altered.
Session altered.
Database altered.
Database altered.
Database altered.
Database altered.
Database altered.

Data files are now migrated to ASM without any downtime. Very simple.


Notice my temporary tablespaces are still on file system.

List of Temporary Files
=======================
File Size(MB) Tablespace           Maxsize(MB) Tempfile Name
---- -------- -------------------- ----------- --------------------
1    100      TEMP                 100         /u01/app/oracle12/oradata/cdb12c/temp01.dbf
2    62       PDB$SEED:TEMP        32767       /u01/app/oracle12/oradata/cdb12c/pdbseed/temp01.dbf
3    100      PDB:TEMP             100         /u01/app/oracle12/oradata/cdb12c/pdb/temp01.dbf


I cannot use command as above to move them so I will use slightly different approach.

SQL> alter tablespace TEMP add tempfile '+DATA';
Tablespace altered.

SQL> alter tablespace TEMP drop tempfile '/u01/app/oracle12/oradata/cdb12c/temp01.dbf';
Tablespace altered.


SQL> alter session set container=pdb;
Session altered.

SQL> alter tablespace TEMP add tempfile '+DATA';
Tablespace altered.

SQL> alter tablespace TEMP drop tempfile '/u01/app/oracle12/oradata/cdb12c/pdb/temp01.dbf';
Tablespace altered.


The tricky part is to move temp file from tablespace TEMP in PDB$SEED used for creation of pluggable databases. I have found this solution on Jože Senegačnik blog post.

SQL> alter session set container=CDB$ROOT;
Session altered.

SQL> alter session set "_oracle_script"=TRUE;
Session altered.

SQL> alter pluggable database pdb$seed close;
Pluggable database altered.

SQL> alter pluggable database pdb$seed open read write;
Pluggable database altered.

SQL> alter session set container=pdb$seed;
Session altered.

SQL> alter tablespace temp add tempfile '+DATA';
Tablespace altered.

SQL>  alter tablespace temp drop tempfile '/u01/app/oracle12/oradata/cdb12c/pdbseed/temp01.dbf';
Tablespace altered.

SQL> alter session set container=CDB$ROOT;
Session altered.

SQL> alter pluggable database pdb$seed close;
Pluggable database altered.

SQL> alter pluggable database pdb$seed open read only;
Pluggable database altered.



All my data files and temporary files are now migrated into ASM.

RMAN> report schema;

using target database control file instead of recovery catalog
Report of database schema for database with db_unique_name CDB12C

List of Permanent Datafiles
===========================
File Size(MB) Tablespace           RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1    800      SYSTEM               YES     +DATA/CDB12C/DATAFILE/system.259.871936371
2    260      PDB$SEED:SYSTEM      NO      +DATA/CDB12C/0C366F6BD5E01C14E055000000000001/DATAFILE/system.265.871936527
3    740      SYSAUX               NO      +DATA/CDB12C/DATAFILE/sysaux.263.871936455
4    585      PDB$SEED:SYSAUX      NO      +DATA/CDB12C/0C366F6BD5E01C14E055000000000001/DATAFILE/sysaux.256.871936497
5    740      UNDOTBS1             YES     +DATA/CDB12C/DATAFILE/undotbs1.257.871936413
6    5        USERS                NO      +DATA/CDB12C/DATAFILE/users.258.871936451
7    270      PDB:SYSTEM           NO      +DATA/CDB12C/0C38EB3894542836E055000000000001/DATAFILE/system.269.871936549
8    605      PDB:SYSAUX           NO      +DATA/CDB12C/0C38EB3894542836E055000000000001/DATAFILE/sysaux.266.871936565
9    5        PDB:USERS            NO      +DATA/CDB12C/0C38EB3894542836E055000000000001/DATAFILE/users.270.871936547
10   50       PDB:MARKO            NO      +DATA/CDB12C/0C38EB3894542836E055000000000001/DATAFILE/marko.271.871936543

List of Temporary Files
=======================
File Size(MB) Tablespace           Maxsize(MB) Tempfile Name
---- -------- -------------------- ----------- --------------------
2    100      PDB$SEED:TEMP        32767       +DATA/CDB12C/0C366F6BD5E01C14E055000000000001/TEMPFILE/temp.264.871937389
4    100      TEMP                 32767       +DATA/CDB12C/TEMPFILE/temp.268.871936871
5    100      PDB:TEMP             32767       +DATA/CDB12C/0C38EB3894542836E055000000000001/TEMPFILE/temp.267.871936909
7    100      PDB$SEED:TEMP        32767       +DATA/CDB12C/0C366F6BD5E01C14E055000000000001/TEMPFILE/temp.262.871937843



Move redo log files into ASM:

SQL> select member from v$logfile;

MEMBER
--------------------------------------------------------------------------------
/u01/app/oracle12/oradata/cdb12c/redo01.log
/u01/app/oracle12/oradata/cdb12c/redo02.log
/u01/app/oracle12/oradata/cdb12c/redo03.log


SQL> select group#, status from v$log;

    GROUP# STATUS
---------- ----------------
         1 INACTIVE
         2 INACTIVE
         3 CURRENT

SQL> alter database drop logfile group 1;

Database altered.

SQL> alter database add logfile group 1 '+DATA';

Database altered.

SQL> alter database drop logfile group 2;

Database altered.

SQL> alter database add logfile group 2 '+DATA';

Database altered.

SQL> alter system switch logfile;

System altered.

SQL> alter system checkpoint;

System altered.

SQL> alter database drop logfile group 3;

Database altered.

SQL> alter database add logfile group 3 '+DATA';

Database altered.


SQL> select member from v$logfile;

MEMBER
--------------------------------------------------------------------------------
+DATA/CDB12C/ONLINELOG/group_1.261.871938327
+DATA/CDB12C/ONLINELOG/group_2.260.871938351
+DATA/CDB12C/ONLINELOG/group_3.272.871938381


All this operations were done online while database was up and running.

Now comes part when some downtime is needed - part when we move control files from file system into ASM.


$ srvctl stop database -d cdb12c


RMAN> startup nomount;

Oracle instance started

Total System Global Area     838860800 bytes

Fixed Size                     2929936 bytes
Variable Size                570428144 bytes
Database Buffers             260046848 bytes
Redo Buffers                   5455872 bytes

RMAN> restore controlfile to '+DATA' from '/u01/app/oracle12/oradata/cdb12c/control01.ctl';

Starting restore at 17-FEB-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=12 device type=DISK

channel ORA_DISK_1: copied control file copy
Finished restore at 17-FEB-15

RMAN> restore controlfile to '+FRA' from '/u01/app/oracle12/oradata/cdb12c/control01.ctl';

Starting restore at 17-FEB-15
using channel ORA_DISK_1

channel ORA_DISK_1: copied control file copy
Finished restore at 17-FEB-15




ASMCMD [+] > find --type CONTROLFILE +DATA *
+DATA/CDB12C/CONTROLFILE/current.273.871940441
ASMCMD [+] > find --type CONTROLFILE +FRA *
+FRA/CDB12C/CONTROLFILE/current.256.871940457



SQL> alter system set control_files='+DATA/CDB12C/CONTROLFILE/current.273.871940441','+FRA/CDB12C/CONTROLFILE/current.256.871940457' scope=spfile;

System altered.

SQL> shutdown abort;
ORACLE instance shut down.



Put parameter file into ASM also and start database.

$ srvctl start database -d cdb12c -o mount


RMAN> run
2> {
3> BACKUP AS BACKUPSET SPFILE;
4> RESTORE SPFILE TO "+DATA/CDB12C/spfilecdb12c.ora";
5> }

Starting backup at 17-FEB-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=14 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current SPFILE in backup set
channel ORA_DISK_1: starting piece 1 at 17-FEB-15
channel ORA_DISK_1: finished piece 1 at 17-FEB-15
piece handle=/u01/app/oracle12/fast_recovery_area/CDB12C/backupset/2015_02_17/o1_mf_nnsnf_TAG20150217T214744_bg7b7jkg_.bkp tag=TAG20150217T214744 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 17-FEB-15

Starting Control File and SPFILE Autobackup at 17-FEB-15
piece handle=/u01/app/oracle12/fast_recovery_area/CDB12C/autobackup/2015_02_17/o1_mf_s_871939236_bg7b7m1p_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 17-FEB-15

Starting restore at 17-FEB-15
using channel ORA_DISK_1

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: restoring SPFILE
output file name=+DATA/CDB12C/spfilecdb12c.ora
channel ORA_DISK_1: reading from backup piece /u01/app/oracle12/fast_recovery_area/CDB12C/autobackup/2015_02_17/o1_mf_s_871939236_bg7b7m1p_.bkp
channel ORA_DISK_1: piece handle=/u01/app/oracle12/fast_recovery_area/CDB12C/autobackup/2015_02_17/o1_mf_s_871939236_bg7b7m1p_.bkp tag=TAG20150217T214745
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
Finished restore at 17-FEB-15



$ srvctl modify database -db cdb12c -spfile '+DATA/CDB12C/spfilecdb12c.ora';



$ srvctl stop database -db cdb12c
$ srvctl start database -db cdb12c



Final checks:

SQL> show parameter spfile

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      +DATA/CDB12C/spfilecdb12c.ora

SQL> set lines 200
SQL> set pages 999
SQL> select name from v$datafile union all
  2  select name from v$tempfile union all
  3  select member from v$logfile union all
  4  select name from v$controlfile;

NAME
-----------------------------------------------------------------------------
+DATA/CDB12C/DATAFILE/system.259.871936371
+DATA/CDB12C/0C366F6BD5E01C14E055000000000001/DATAFILE/system.265.871936527
+DATA/CDB12C/DATAFILE/sysaux.263.871936455
+DATA/CDB12C/0C366F6BD5E01C14E055000000000001/DATAFILE/sysaux.256.871936497
+DATA/CDB12C/DATAFILE/undotbs1.257.871936413
+DATA/CDB12C/DATAFILE/users.258.871936451
+DATA/CDB12C/0C38EB3894542836E055000000000001/DATAFILE/system.269.871936549
+DATA/CDB12C/0C38EB3894542836E055000000000001/DATAFILE/sysaux.266.871936565
+DATA/CDB12C/0C38EB3894542836E055000000000001/DATAFILE/users.270.871936547
+DATA/CDB12C/0C38EB3894542836E055000000000001/DATAFILE/marko.271.871936543
+DATA/CDB12C/0C366F6BD5E01C14E055000000000001/TEMPFILE/temp.264.871937389
+DATA/CDB12C/TEMPFILE/temp.268.871936871
+DATA/CDB12C/0C38EB3894542836E055000000000001/TEMPFILE/temp.267.871936909
+DATA/CDB12C/0C366F6BD5E01C14E055000000000001/TEMPFILE/temp.262.871937843
+DATA/CDB12C/ONLINELOG/group_1.261.871938327
+DATA/CDB12C/ONLINELOG/group_2.260.871938351
+DATA/CDB12C/ONLINELOG/group_3.272.871938381
+DATA/CDB12C/CONTROLFILE/current.273.871940441
+FRA/CDB12C/CONTROLFILE/current.256.871940457

19 rows selected.




REFERENCES
http://docs.oracle.com/database/121/ADMIN/dfiles.htm#ADMIN012
http://joze-senegacnik.blogspot.com/2015/02/maintaining-tempfile-in-temp-tablespace.html


Read More...

MariaDB - Measure Replicaton Lag and Check / Fix Replication Inconsistencies using Percona tools

Thursday, February 5, 2015 0 comments
Percona Toolkit is collection of command-line tools to perform many MySQL tasks like creating backups, finding duplicate indexes, managing replication, etc.

In this post I will talk about how to measure replication lag and check/fix replication inconsistencies with this tools:
pt-heartbeat
pt-table-checksum
pt-table-sync


I am using environment from previous blog post.
Master-Master replication with MariaDB 10.0.16 database on Debian 7.


Install Percona Toolkit on both nodes:

$ sudo wget percona.com/get/percona-toolkit.deb

$ sudo apt-get install libterm-readkey-perl
$ sudo dpkg -i percona-toolkit.deb  


I will create percona database where I will store tables needed for various checks. Also I will create percona user which will be used with Percona tools.



MASTER1

MariaDB [(none)]> create database percona;

MariaDB [(none)]> grant all privileges on *.* to 'percona'@'master1.localdomain' identified by 'percona';

MariaDB [(none)]> grant all privileges on *.* to 'percona'@'localhost' identified by 'percona';

MariaDB [(none)]> flush privileges;


MASTER2

MariaDB [(none)]> grant all privileges on *.* to 'percona'@'master2.localdomain' identified by 'percona';

MariaDB [(none)]> grant all privileges on *.* to 'percona'@'localhost' identified by 'percona';

MariaDB [(none)]> flush privileges;




MONITOR REPLICATION LAG


So, I have replication running and I want to be sure that everything is working fine.
Typical method to monitor replication lag would be to run SLAVE STATUS and look at Seconds_Behind_Master. But Seconds_Behind_Master is not always accurate.

Percona Toolkit has a tool to monitor replication delay called pt-heartbeat.

We must create heartbeat table on the master manually or using --create-table option and heartbeat table must contain one heartbeat row. This table will be updated in interval we specify by pt-heartbeat. Slave will actively check table and calculate time delay.


Create heartbeat table and start daemonized process to update percona.heartbeat table.


MASTER1

$ pt-heartbeat -upercona -ppercona -D percona --update master1 --daemonize --create-table


MASTER2
Start pt-heartbeat.

$ pt-heartbeat -upercona -ppercona --update --database percona


MASTER1
Monitor replication slave lag.

$ pt-heartbeat -upercona -ppercona -D percona --monitor -h master2
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.01s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]





CHECK REPLICATION INCONSISTENCIES


If we want to check replication integrity we can use pt-table-checksum tool.

Run tool on master server. It will automatically detect slave servers and connect to them to do some safety checks. After that it runs checksums on the tables of the master database and reports results in the checksum table. This results are then compared with the results on the slave whether the data differs.
You can inspect that table anytime - in this example percona.checksums table.

If there are no different rows in the tables between master and slave database DIFF will show 0.


$ pt-table-checksum -upercona -ppercona --create-replicate-table --replicate percona.checksums --databases testdb -h master2
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
02-02T20:58:15      0      0        5       1       0   1.134 testdb.users


MASTER2

MariaDB [testdb]> create table address (id int auto_increment primary key, city varchar(30));
Query OK, 0 rows affected (0.06 sec)

MariaDB [testdb]> insert into address (city) values ('New York');
Query OK, 1 row affected (0.07 sec)

MariaDB [testdb]> insert into address (city) values ('LA');
Query OK, 1 row affected (0.06 sec)

MariaDB [testdb]> insert into address (city) values ('Zagreb');
Query OK, 1 row affected (0.13 sec)


MASTER1

$ pt-table-checksum -upercona -ppercona --replicate percona.checksums --databases testdb -h master2
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
02-02T20:59:16      0      0        3       1       0   1.032 testdb.address
02-02T20:59:17      0      0        5       1       0   1.120 testdb.users


$ pt-table-checksum -upercona -ppercona --replicate=percona.checksums --replicate-check-only --databases=testdb master1


Nothing received in output which means that testdb database is in sync with slave.


Insert some test data:

MariaDB [testdb]> create table animals (id int not null auto_increment,
    -> name char(30) not null,
    -> primary key(id));
Query OK, 0 rows affected (0.04 sec)

MariaDB [testdb]> insert into animals (name) values ('dog'),('cat'),('whale');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0


MariaDB [testdb]> create table countries (id int not null auto_increment,
    -> name varchar(30),
    -> primary key(id));
Query OK, 0 rows affected (0.09 sec)

MariaDB [testdb]> insert into countries(name) values ('Croatia'),('England'),('USA'),('Island');
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

MariaDB [testdb]> select * from animals;
+----+-------+
| id | name  |
+----+-------+
|  1 | dog   |
|  2 | cat   |
|  3 | whale |
+----+-------+
3 rows in set (0.00 sec)

MariaDB [testdb]>  select * from countries;
+----+---------+
| id | name    |
+----+---------+
|  1 | Croatia |
|  2 | England |
|  3 | USA     |
|  4 | Island  |
+----+---------+
4 rows in set (0.00 sec)


Check if database is in sync:

$ pt-table-checksum -upercona -ppercona --create-replicate-table --replicate percona.checksums --databases testdb -h master1
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
02-02T21:03:49      0      0        3       1       0   0.177 testdb.address
02-02T21:03:49      0      0        3       1       0   0.045 testdb.animals
02-02T21:03:49      0      0        4       1       0   0.049 testdb.countries
02-02T21:03:49      0      0        5       1       0   0.037 testdb.users





RESYNC REPLICA FROM THE MASTER


Lets make database on MASTER2 out-of-sync and create some differences between databases.


MASTER2

Instead of stopping replication process, I will temporarily disable binary logging on MASTER2 server.

MariaDB [testdb]> SET SQL_LOG_BIN=0;
Query OK, 0 rows affected (0.00 sec)


Make same data modifications.

MariaDB [testdb]> insert into animals (name) values ('Ostrich'),('Penguin');
Query OK, 2 rows affected (0.04 sec)
Records: 2  Duplicates: 0  Warnings: 0

MariaDB [testdb]> delete from countries where id=2;
Query OK, 1 row affected (0.01 sec)


MariaDB [testdb]> create table colors (name varchar(30));
Query OK, 0 rows affected (0.10 sec)

MariaDB [testdb]> insert into colors(name) values ('Red'),('Blue');
Query OK, 2 rows affected (0.02 sec)
Records: 2  Duplicates: 0  Warnings: 0


Enable binary logging again.

MariaDB [testdb]> SET SQL_LOG_BIN=1;
Query OK, 0 rows affected (0.00 sec)



MASTER1

MariaDB [testdb]> select * from animals;
+----+-------+
| id | name  |
+----+-------+
|  1 | dog   |
|  2 | cat   |
|  3 | whale |
+----+-------+
3 rows in set (0.00 sec)

MariaDB [testdb]> select * from countries;
+----+---------+
| id | name    |
+----+---------+
|  1 | Croatia |
|  2 | England |
|  3 | USA     |
|  4 | Island  |
+----+---------+
4 rows in set (0.00 sec)

MariaDB [testdb]> show tables;
+------------------+
| Tables_in_testdb |
+------------------+
| address          |
| animals          |
| countries        |
| users            |
+------------------+
4 rows in set (0.00 sec)


MASTER2

MariaDB [testdb]> select * from animals;
+----+---------+
| id | name    |
+----+---------+
|  1 | dog     |
|  2 | cat     |
|  3 | whale   |
|  4 | Ostrich |
|  5 | Penguin |
+----+---------+
5 rows in set (0.00 sec)

MariaDB [testdb]> select * from countries;
+----+---------+
| id | name    |
+----+---------+
|  1 | Croatia |
|  3 | USA     |
|  4 | Island  |
+----+---------+
3 rows in set (0.00 sec)

MariaDB [testdb]> show tables;
+------------------+
| Tables_in_testdb |
+------------------+
| address          |
| animals          |
| colors           |
| countries        |
| users            |
+------------------+
5 rows in set (0.00 sec)


Notice that there are some inconsistencies between databases and there isn’t any built-in tool that will notify us about that. Replication is working fine, even though replica has different data than master.

With pt-table-checksum we will check data differences between databases.


MASTER1

$ pt-table-checksum -upercona -ppercona --create-replicate-table --replicate percona.checksums --databases testdb -h master1
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
02-02T21:11:23      0      0        3       1       0   0.106 testdb.address
02-02T21:11:23      0      1        3       1       0   0.053 testdb.animals
02-02T21:11:24      0      1        4       1       0   0.046 testdb.countries
02-02T21:11:24      0      0        5       1       0   0.042 testdb.users


$ pt-table-checksum -upercona -ppercona --replicate=percona.checksums --replicate-check-only --databases=testdb master1
Differences on master2
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
testdb.animals 1 2 1
testdb.countries 1 -1 1

Notice how tool reported differences in DIFFS column.


Synchronizing data between servers in master-master configuration is not trivial task. You have to think about which process is changing data where and be very careful to avoid data corruption.

In master-master configuration data changes are replicated between nodes and statements executed on "slave" node are replicated to the master.

Maybe the best approach would be to stop replication, restore replica from backup or reclone whole server and start replication again. You can also dump only affected data with mysqldump and reload it.


As this is my testing environment I will try to resolve differences using pt-table-sync tool from Percona toolkit.


First I will use tool with --print option which will only display me queries that will resolve differences. I will inspect those queries before executing them on the slave server.
These queries could be executed manually also.

$ pt-table-sync -upercona -ppercona --sync-to-master --databases testdb --transaction --lock=1 --verbose master2 --print

# Syncing h=master2,p=...,u=percona
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      0      0 Chunk     22:13:17 22:13:17 0    testdb.address
DELETE FROM `testdb`.`animals` WHERE `id`='4' LIMIT 1 /*percona-toolkit src_db:testdb src_tbl:animals src_dsn:P=3306,h=master1,p=...,u=percona dst_db:testdb dst_tbl:animals dst_dsn:h=master2,p=...,u=percona lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:7723 user:msutic host:master1*/;
DELETE FROM `testdb`.`animals` WHERE `id`='5' LIMIT 1 /*percona-toolkit src_db:testdb src_tbl:animals src_dsn:P=3306,h=master1,p=...,u=percona dst_db:testdb dst_tbl:animals dst_dsn:h=master2,p=...,u=percona lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:7723 user:msutic host:master1*/;
#      2       0      0      0 Chunk     22:13:17 22:13:17 2    testdb.animals
REPLACE INTO `testdb`.`countries`(`id`, `name`) VALUES ('2', 'England') /*percona-toolkit src_db:testdb src_tbl:countries src_dsn:P=3306,h=master1,p=...,u=percona dst_db:testdb dst_tbl:countries dst_dsn:h=master2,p=...,u=percona lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:7723 user:msutic host:master1*/;
#      0       1      0      0 Chunk     22:13:17 22:13:17 2    testdb.countries
#      0       0      0      0 Chunk     22:13:17 22:13:17 0    testdb.users


Set --execute option to execute those queries.
With --sync-to-master option we will treat MASTER2 server as a slave.


$ pt-table-sync -upercona -ppercona --sync-to-master --databases testdb --transaction --lock=1 --verbose master2 --execute

# Syncing h=master2,p=...,u=percona
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      0      0 Chunk     22:19:51 22:19:51 0    testdb.address
#      2       0      0      0 Chunk     22:19:51 22:19:51 2    testdb.animals
#      0       1      0      0 Chunk     22:19:51 22:19:51 2    testdb.countries
#      0       0      0      0 Chunk     22:19:51 22:19:51 0    testdb.users


Output shows that differences are successfully resolved with two DELETE and one REPLACE operation on specified tables.

Let’s run another check to verify if differences still exist.

$ pt-table-checksum -upercona -ppercona --create-replicate-table --replicate percona.checksums --databases testdb -h master1

            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
02-02T22:21:30      0      0        3       1       0   0.549 testdb.address
02-02T22:21:30      0      0        3       1       0   0.048 testdb.animals
02-02T22:21:30      0      0        4       1       0   0.043 testdb.countries
02-02T22:21:30      0      0        5       1       0   0.049 testdb.users

DIFFS columns shows only 0 which means that tables are in sync.




What if I run checksums on MASTER2 server.


MASTER2

$ pt-table-checksum -upercona -ppercona --create-replicate-table --replicate percona.checksums --databases testdb -h master2

            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
02-02T22:24:16      0      0        3       1       0   0.072 testdb.address
02-02T22:24:16      0      0        3       1       0   0.048 testdb.animals
02-02T22:24:16 Skipping table testdb.colors because it has problems on these replicas:
Table testdb.colors does not exist on replica master1
This can break replication.  If you understand the risks, specify --no-check-slave-tables to disable this check.
02-02T22:24:16 Error checksumming table testdb.colors: DBD::mysql::db selectrow_hashref failed: Table 'testdb.colors' doesn't exist [for Statement "EXPLAIN SELECT * FROM `testdb`.`colors` WHERE 1=1"] at /usr/bin/pt-table-checksum line 6595.

02-02T22:24:16      1      0        0       0       0   0.003 testdb.colors
02-02T22:24:16      0      0        4       1       0   0.044 testdb.countries
02-02T22:24:16      0      0        5       1       0   0.043 testdb.users


Output shows error because table testdb.colors exists on MASTER2 but not in MASTER1.

I know that MASTER1 has "correct" data so I will just drop testdb.colors table on MASTER2 node.

MariaDB [testdb]> drop table if exists testdb.colors;
Query OK, 0 rows affected (0.05 sec)


Run check again:

$ pt-table-checksum -upercona -ppercona --create-replicate-table --replicate percona.checksums --databases testdb -h master2
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
02-02T22:26:43      0      0        3       1       0   0.322 testdb.address
02-02T22:26:43      0      0        3       1       0   0.056 testdb.animals
02-02T22:26:43      0      0        4       1       0   0.050 testdb.countries
02-02T22:26:43      0      0        5       1       0   0.045 testdb.users


Now we have synced databases.



If we use --quiet option tool will print out row per table only if there are some differences. This is nice way to run tool from a cron job and send mail only if there is non-zero exit status.

$ pt-table-checksum -upercona -ppercona --create-replicate-table --replicate percona.checksums --databases testdb -h master1 --quiet
(no rows)




REFERENCES
http://www.percona.com/doc/percona-toolkit/2.2/pt-table-sync.html
http://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html
http://www.percona.com/software/percona-toolkit


Read More...