Running active-passive solution using Oracle Clusterware was possible even before introduction of RAC One Node, but RAC One node brought configuration and managing of active-passive cluster to another level.
In my test environment I've created active-passive cluster using 12cR1 Grid Infrastructure with 12cR1 RAC One Node database.
I've tested failover for Oracle database and noticed interesting behavior with instance placement.
Configuration of my database:
$ srvctl config database -d one12 Database unique name: one12 Database name: one12 Oracle home: /u01/app/orcl12/product/12.1.0/dbhome_1 Oracle user: orcl12 Spfile: +DATA/one12/spfileone12.ora Password file: +DATA/one12/orapwone12 Domain: Start options: open Stop options: immediate Database role: PRIMARY Management policy: AUTOMATIC Server pools: one12 Database instances: Disk Groups: DATA,FRA Mount point paths: Services: one12app Type: RACOneNode Online relocation timeout: 30 Instance name prefix: one12 Candidate servers: cluster1,cluster2 Database is administrator managed
Notice type is RACOneNode which is specified for RAC One Node database "one12".
Check status of database.
$ srvctl status database -d one12 Instance one12_1 is running on node cluster1 Online relocation: INACTIVE
col instance_name for a20 col host_name for a20 SQL> select host_name, instance_name from v$instance; HOST_NAME INSTANCE_NAME -------------------- -------------------- cluster1.localdomain one12_1
Instance one12_1 is running on node cluster1.
Let's relocate database to another node.
$ srvctl relocate database -d one12 -n cluster2 -w 5 -v Configuration updated to two instances Instance one12_2 started Services relocated Waiting for up to 5 minutes for instance one12_1 to stop ... Instance one12_1 stopped Configuration updated to one instance
Check status.
$ srvctl status database -d one12 Instance one12_2 is running on node cluster2 Online relocation: INACTIVE
col instance_name for a20 col host_name for a20 SQL> select host_name, instance_name from v$instance; HOST_NAME INSTANCE_NAME -------------------- -------------------- cluster2.localdomain one12_2
After relocation active node became cluster2. Instance one12_2 was brought up on cluster2 node while one12_1 was shut down on cluster1 node.
Now I will simulate instance crash killing PMON process on cluster2 node. Clusterware will restart failed instance few times on the same node and then relocate database to cluster1 node.
You can check in resource parameters how many restart attempts will Clusterware perform before failover to other node.
$ ./crsctl stat res ora.one12.db -p|grep "RESTART_ATTEMPTS" RESTART_ATTEMPTS=2
In my case Clusterware will try twice to restart failed instance before relocating to other node.
Check status of resource.
$ ./crsctl stat res ora.one12.db -v |\ > grep -w 'STATE\|RESTART_COUNT\|FAILURE_COUNT' STATE=ONLINE on cluster2 RESTART_COUNT=0 FAILURE_COUNT=0
Kill PMON on cluster2 node.
$ ps -ef|grep pmon_one12|grep -v grep orcl12 4503 1 0 12:19 ? 00:00:00 ora_pmon_one12_2 $ kill -9 4503
Wait for a short time for Clusterware to restart instance and check status.
$ ./crsctl stat res ora.one12.db -v |\ > grep -w 'STATE\|RESTART_COUNT\|FAILURE_COUNT' STATE=ONLINE on cluster2 RESTART_COUNT=1 FAILURE_COUNT=0
Clusterware restarted instance on the same node. Notice RESTART_COUNT is now 1.
Kill PMON again.
$ ps -ef|grep pmon_one12|grep -v grep orcl12 5262 1 0 12:37 ? 00:00:00 ora_pmon_one12_2 $ kill -9 5262
Check status.
$ ./crsctl stat res ora.one12.db -v |\ > grep -w 'STATE\|RESTART_COUNT\|FAILURE_COUNT' STATE=ONLINE on cluster2 RESTART_COUNT=2 FAILURE_COUNT=0
RESTART_COUNT is now 2. Next time I kill PMON process Clusterware should relocate instance to cluster1 node.
$ ps -ef|grep pmon_one12|grep -v grep orcl12 5588 1 0 12:40 ? 00:00:00 ora_pmon_one12_2 $ kill -9 5588
Check status.
$ ./crsctl stat res ora.one12.db -v |\ > grep -w 'STATE\|RESTART_COUNT\|FAILURE_COUNT' STATE=ONLINE on cluster1 RESTART_COUNT=0 FAILURE_COUNT=1
Clusterware started instance on cluster1 node as expected. STATE is now ONLINE on cluster1.
Now the interesting part.
$ srvctl status database -d one12 Instance one12_2 is running on node cluster1 Online relocation: INACTIVE
col instance_name for a20 col host_name for a20 SQL> select host_name, instance_name from v$instance; HOST_NAME INSTANCE_NAME -------------------- -------------------- cluster1.localdomain one12_2
Notice instance placement.
Instance one12_2 is started on cluster1 node. Clusterware kept instance name one12_2 after failover to cluster1 node.
Can we start instance one12_1 on cluster1 node manually.
$ srvctl stop database -d one12 $ srvctl start instance -db one12 -n cluster1 -instance one12_1 PRKO-2136 : 'srvctl start/stop/enable/disable/modify/status/setenv/getenv/unsetenv instance' commands are not supported with RAC One Node databases
This operation is not supported with RAC One Node.
Start database on cluster2 node.
$ srvctl start database -d one12 -n cluster2 $ srvctl status database -d one12 Instance one12_2 is running on node cluster2 Online relocation: INACTIVE
col instance_name for a20 col host_name for a20 SQL> select host_name, instance_name from v$instance; HOST_NAME INSTANCE_NAME -------------------- -------------------- cluster2.localdomain one12_2
Instance one12_2 is started on cluster2 host.
Relocate database to cluster1 node.
$ srvctl relocate database -d one12 -n cluster1 -w 5 -v Configuration updated to two instances Instance one12_1 started Services relocated Waiting for up to 5 minutes for instance one12_2 to stop ... Instance one12_2 stopped Configuration updated to one instance
Check status.
$ srvctl status database -d one12 Instance one12_1 is running on node cluster1 Online relocation: INACTIVE
col instance_name for a20 col host_name for a20 SQL> select host_name, instance_name from v$instance; HOST_NAME INSTANCE_NAME -------------------- -------------------- cluster1.localdomain one12_1
I have "one12_1" instance running on "cluster1" node again.
From documentation:
Using the Single Client Access Name (SCAN) to connect to the database, clients can locate the service independently of the node on which it is running.
As clients should connect to database using service name (Oracle recommendation) instance placement shouldn't be important.
"As clients should connect to database using service name (Oracle recommendation) instance placement shouldn't be important."
ReplyDelete...except in the case where you are monitoring via Cloud Control, where an instance that was discovered is suddenly not running. Cloud Control still doesn't seem to understand that in the context of One Node, a "missing" instance is not necessarily a sign of fault.
This comment has been removed by a blog administrator.
ReplyDelete