tag:blogger.com,1999:blog-25306824276570164262024-03-19T10:11:34.044+01:00Marko Sutic's Database BlogMarko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.comBlogger124125tag:blogger.com,1999:blog-2530682427657016426.post-91686100260726943682021-11-16T10:48:00.354+01:002021-11-17T23:27:39.321+01:00PostgreSQL HA - Patroni, ETCD, HAProxy, Keepalived - Test failure scenariosI will test few failover/maintenance scenarios and show results in this blog post. <br>
<br>
Just to mention, this is <b>not</b> proper production test. Before considering this setup for the production it would be great to put cluster under proper load, simulate slow IO response time, memory crashes, etc. and check cluster behavior.<br>
<br>
In this tests I am only checking start/stop resources in various scenarios.<br>
<br>
<span id="fullpost">
<b>Standby tests</b><br>
<br>
<span style="font-size: small;">
<table>
<thead>
<tr>
<th>No.</th>
<th>Test Scenario</th>
<th>Downtime</th>
<th>Observation</th>
</tr>
</thead>
<tbody>
<tr>
<td>1.</td>
<td>Kill PostgreSQL process</td>
<td>10 secs 1 RO node</td>
<td>- No problems for writer process. <br> - Patroni started on the boot and brought PostgreSQL instance automatically</td>
</tr>
<tr>
<td>2.</td>
<td>Stop the PostgreSQL process</td>
<td>27 secs 1 RO node</td>
<td>- No problems for read write process. <br> - Patroni started on the boot and brought PostgreSQL instance automatically</td>
</tr>
<tr>
<td>3.</td>
<td>Reboot the server</td>
<td>27 secs 1 RO node</td>
<td>- No problems for the write process. <br> - Patroni started on the boot and brought PostgreSQL instance automatically.</td>
</tr>
<tr>
<td>4.</td>
<td>Stop the Patroni process</td>
<td>25 secs 1 RO node</td>
<td>- No problem for the write process. <br> - Stopping Patroni stopped PostgreSQL process and excluded 192.168.56.53 node from the cluster. <br> - After starting Patroni brought PosgreSQL instance and joined to the cluster automatically.</td>
</tr>
</tbody>
</table>
</span>
<b>Master tests</b><br>
<br>
<span style="font-size: small;">
<table>
<thead>
<tr>
<th>No.</th>
<th>Test Scenario</th>
<th>Downtime</th>
<th>Observation</th>
</tr>
</thead>
<tbody>
<tr>
<td>1.</td>
<td>Kill PostgreSQL process</td>
<td>10 secs RW</td>
<td>- After killing PostgreSQL process Patroni brought service back to the running state. <br> - No distruption for the read only requests.</td>
</tr>
<tr>
<td>2.</td>
<td>Stop the PostgreSQL process</td>
<td>7 secs RW</td>
<td>- Patroni brought PostgreSQL to the running state. Election was not triggered.</td>
</tr>
<tr>
<td>3.</td>
<td>Reboot the server</td>
<td>17 secs RW</td>
<td>- Faiover happened and one of the slave servers was elected as the new master. <br> - On the old master server, Petroni brought PostgreSQL and performed pg_rewind to create replica.</td>
</tr>
<tr>
<td>4.</td>
<td>Stop Patroni process</td>
<td>10 ses RW</td>
<td>- Patroni stopped PostgreSQL instance and new master node was elected. <br> - After starting Patroni, old master server was rewound using pg_rewind and new replica joined to the cluster.</td>
</tr>
</tbody>
</table>
</span>
<b>Network isolation tests</b><br>
<br>
<span style="font-size: small;">
<table>
<thead>
<tr>
<th>No.</th>
<th>Test Scenario</th>
<th>Downtime</th>
<th>Observation</th>
</tr>
</thead>
<tbody>
<tr>
<td>1.</td>
<td>Network isolate master server from the configuration</td>
<td>31 secs RW</td>
<td>- New master was elected. <br> - Bringing back communication to old master server did not bring old master server as replica automatically. <br> - Restarting Patroni it brought PostgreSQL instance on 192.168.56.53 as replica.</td>
</tr>
<tr>
<td>2.</td>
<td>Network isolate slave server from the configuration</td>
<td>0 secs RW</td>
<td>- Isolated standby server was excluded from the cluster configuration. <br> - Bringing back communication to the standby server node rejoined to the cluster automatically.</td>
</tr>
</tbody>
</table>
</span>
<br>
<br>
Pinging cluster on the <b>read write</b> interface (port 5000): <br>
<br>
<pre class="brush: text">while true; do echo "select inet_server_addr(),now()::timestamp" | psql -Upostgres -h192.168.56.100 -p5000 -t; sleep 1; done
</pre>
Pinging cluster on the <b>read only</b> interface (port 5001): <br>
<br>
<pre class="brush: text">while true; do echo "select inet_server_addr(),now()::timestamp" | psql -Upostgres -h192.168.56.100 -p5001 -t; sleep 1; done
</pre>
<br>
<span style="font-size: x-large;">Standby Tests</span><br>
<br>
<span style="font-size: large;">1. Kill PostgreSQL process</span><br>
<br>
<b>Read Write</b><br>
<br>
<pre class="brush: text">192.168.56.52 | 2021-11-11 19:54:45.462582
192.168.56.52 | 2021-11-11 19:54:46.483056
192.168.56.52 | 2021-11-11 19:54:47.502918
192.168.56.52 | 2021-11-11 19:54:48.522746
192.168.56.52 | 2021-11-11 19:54:49.544109
192.168.56.52 | 2021-11-11 19:54:50.564185
192.168.56.52 | 2021-11-11 19:54:51.585437
192.168.56.52 | 2021-11-11 19:54:52.607154
192.168.56.52 | 2021-11-11 19:54:53.628248
192.168.56.52 | 2021-11-11 19:54:54.649941
192.168.56.52 | 2021-11-11 19:54:55.671482
</pre>
No problems for writer process.<br>
<br>
<b>Read Only</b><br>
<br>
<pre class="brush: text">
192.168.56.53 | 2021-11-11 19:54:45.505932 <<-- KILL POSTGRES PROCESS ON 192.168.56.53
192.168.56.51 | 2021-11-11 19:54:46.562742
psql: error: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
192.168.56.51 | 2021-11-11 19:54:50.598035
psql: error: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
192.168.56.51 | 2021-11-11 19:54:54.633922
192.168.56.51 | 2021-11-11 19:54:55.655928
192.168.56.51 | 2021-11-11 19:54:56.679855
192.168.56.51 | 2021-11-11 19:54:57.70306
192.168.56.51 | 2021-11-11 19:54:58.725866
192.168.56.51 | 2021-11-11 19:54:59.749008
192.168.56.51 | 2021-11-11 19:55:00.770238
192.168.56.51 | 2021-11-11 19:55:01.791585
192.168.56.53 | 2021-11-11 19:55:02.779865 <<-- PATRONI BROUGHT POSTGRESQL PROCESS
192.168.56.51 | 2021-11-11 19:55:03.835348
192.168.56.53 | 2021-11-11 19:55:04.825825
192.168.56.51 | 2021-11-11 19:55:05.890109
</pre>
After 10 secs Patroni brought PostgreSQL automatically.<br>
<br>
<span style="font-size: large;">2. Stop the PostgreSQL process</span><br>
<br>
<b>Read Write</b><br>
<br>
<pre class="brush: text">192.168.56.52 | 2021-11-11 20:05:18.785093
192.168.56.52 | 2021-11-11 20:05:19.806449
192.168.56.52 | 2021-11-11 20:05:20.82694
192.168.56.52 | 2021-11-11 20:05:21.847219
192.168.56.52 | 2021-11-11 20:05:22.868177
192.168.56.52 | 2021-11-11 20:05:23.888856
192.168.56.52 | 2021-11-11 20:05:24.90578
</pre>
No problems for read write process.<br>
<br>
<b>Read Only</b><br>
<br>
<pre class="brush: text">
192.168.56.53 | 2021-11-11 20:05:18.990093 <<-- STOP POSTGRESQL PROCESS
192.168.56.51 | 2021-11-11 20:05:20.04388
psql: error: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
192.168.56.51 | 2021-11-11 20:05:24.08155
192.168.56.53 | 2021-11-11 20:05:25.073322 <<-- PATRONI BROUGHT POSTGRESQL PROCESS
</pre>
Patroni brough PostgreSQL instance in 6 seconds.<br>
<br>
<span style="font-size: large;">3. Reboot the server</span><br>
<br>
<b>Read Write</b><br>
<br>
<pre class="brush: text">192.168.56.52 | 2021-11-11 20:10:13.171874
192.168.56.52 | 2021-11-11 20:10:14.193623
192.168.56.52 | 2021-11-11 20:10:15.217776
192.168.56.52 | 2021-11-11 20:10:16.239323
192.168.56.52 | 2021-11-11 20:10:17.257308
192.168.56.52 | 2021-11-11 20:10:18.27552
192.168.56.52 | 2021-11-11 20:10:19.292373
192.168.56.52 | 2021-11-11 20:10:20.310198
192.168.56.52 | 2021-11-11 20:10:21.32735
192.168.56.52 | 2021-11-11 20:10:22.343773
192.168.56.52 | 2021-11-11 20:10:23.361844
192.168.56.52 | 2021-11-11 20:10:24.38691
192.168.56.52 | 2021-11-11 20:10:25.407598
192.168.56.52 | 2021-11-11 20:10:26.429343
192.168.56.52 | 2021-11-11 20:10:27.450577
192.168.56.52 | 2021-11-11 20:10:28.471854
192.168.56.52 | 2021-11-11 20:10:29.492637
192.168.56.52 | 2021-11-11 20:10:30.512336
192.168.56.52 | 2021-11-11 20:10:31.533257
192.168.56.52 | 2021-11-11 20:10:32.554038
192.168.56.52 | 2021-11-11 20:10:33.574338
192.168.56.52 | 2021-11-11 20:10:34.596119
192.168.56.52 | 2021-11-11 20:10:35.615495
192.168.56.52 | 2021-11-11 20:10:36.637819
192.168.56.52 | 2021-11-11 20:10:37.659621
192.168.56.52 | 2021-11-11 20:10:38.682478
192.168.56.52 | 2021-11-11 20:10:39.703187
192.168.56.52 | 2021-11-11 20:10:40.727444
</pre>
No problems for the write process.<br>
<br>
<b>Read Only</b><br>
<br>
<pre class="brush: text">
192.168.56.53 | 2021-11-11 20:10:12.314665 <<-- REBOOT THE 192.168.56.53 SERVER
192.168.56.51 | 2021-11-11 20:10:13.304627
psql: error: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
192.168.56.51 | 2021-11-11 20:10:24.340825
psql: error: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
192.168.56.51 | 2021-11-11 20:10:29.42999
192.168.56.51 | 2021-11-11 20:10:30.44846
192.168.56.51 | 2021-11-11 20:10:31.470978
192.168.56.51 | 2021-11-11 20:10:32.49244
192.168.56.51 | 2021-11-11 20:10:33.515443
192.168.56.51 | 2021-11-11 20:10:34.53563
192.168.56.51 | 2021-11-11 20:10:35.553104
192.168.56.51 | 2021-11-11 20:10:36.572375
192.168.56.51 | 2021-11-11 20:10:37.595694
192.168.56.51 | 2021-11-11 20:10:38.620022
192.168.56.53 | 2021-11-11 20:10:39.644502 <<-- PATRONI STARTED ON THE BOOT AND STARTET POSTGRESQL PROCESS
</pre>
Patroni started on the boot and brought PostgreSQL instance automatically in 27 secs.<br>
<br>
<span style="font-size: large;">4. Stop the Patroni process</span><br>
<br>
<b>Read Write</b><br>
<br>
<pre class="brush: text">192.168.56.52 | 2021-11-11 20:25:01.931924
192.168.56.52 | 2021-11-11 20:25:02.954774
192.168.56.52 | 2021-11-11 20:25:03.975514
192.168.56.52 | 2021-11-11 20:25:04.99868
192.168.56.52 | 2021-11-11 20:25:06.021456
192.168.56.52 | 2021-11-11 20:25:07.048917
192.168.56.52 | 2021-11-11 20:25:08.071156
192.168.56.52 | 2021-11-11 20:25:09.093902
192.168.56.52 | 2021-11-11 20:25:10.117138
192.168.56.52 | 2021-11-11 20:25:11.138296
192.168.56.52 | 2021-11-11 20:25:12.159975
192.168.56.52 | 2021-11-11 20:25:13.186149
192.168.56.52 | 2021-11-11 20:25:14.20717
192.168.56.52 | 2021-11-11 20:25:15.2286
</pre>
No problem for the write process.<br>
<br>
<b>Read Only</b><br>
<br>
Stopping Patroni stopped PostgreSQL process and excluded 192.168.56.53 node from the cluster.<br>
<br>
<pre class="brush: text">+-----------+---------------+---------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| psql13n51 | 192.168.56.51 | Replica | running | 9 | 0 |
| psql13n52 | 192.168.56.52 | Leader | running | 9 | |
+-----------+---------------+---------+---------+----+-----------+
192.168.56.51 | 2021-11-11 20:24:52.731887
psql: error: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
192.168.56.51 | 2021-11-11 20:24:56.772703
psql: error: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
192.168.56.51 | 2021-11-11 20:25:00.811616
192.168.56.51 | 2021-11-11 20:25:01.837298
192.168.56.51 | 2021-11-11 20:25:02.860275
192.168.56.51 | 2021-11-11 20:25:03.8829
192.168.56.51 | 2021-11-11 20:25:04.906505
192.168.56.51 | 2021-11-11 20:25:05.932158
</pre>
Start Patroni.<br>
<br>
After starting Patroni brought PosgreSQL instance and joined to the cluster automatically.<br>
<br>
<pre class="brush: text">+-----------+---------------+---------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| psql13n51 | 192.168.56.51 | Replica | running | 9 | 0 |
| psql13n52 | 192.168.56.52 | Leader | running | 9 | |
| psql13n53 | 192.168.56.53 | Replica | running | 9 | 0 |
+-----------+---------------+---------+---------+----+-----------+
192.168.56.51 | 2021-11-11 20:28:18.473041
192.168.56.51 | 2021-11-11 20:28:19.495974
192.168.56.51 | 2021-11-11 20:28:20.518773
192.168.56.51 | 2021-11-11 20:28:21.541587
192.168.56.51 | 2021-11-11 20:28:22.563967
192.168.56.51 | 2021-11-11 20:28:23.586971
192.168.56.51 | 2021-11-11 20:28:24.608738
192.168.56.53 | 2021-11-11 20:28:25.63165
</pre>
It took 7 seconds to route traffic on stanby node after starting Patroni process.<br>
<br>
<br>
<span style="font-size: x-large;">Master Tests</span><br>
<br>
<span style="font-size: large;">1. Kill PostgreSQL process</span><br>
<br>
<b>Read Write</b><br>
<br>
<pre class="brush: text">
192.168.56.52 | 2021-11-11 20:40:55.246602
192.168.56.52 | 2021-11-11 20:40:56.270163 <<-- KILL POSTGRESQL PROCESS.
psql: error: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql: error: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
192.168.56.52 | 2021-11-11 20:41:05.318191 <<-- PATRONI BROUGHT POSTGRESQL SERVICE BACK
192.168.56.52 | 2021-11-11 20:41:06.341719
</pre>
After killing PostgreSQL process Patroni brought service back to the running state. We had 10 secs downtime for the writer process.<br>
<br>
<b>Read Only</b><br>
<br>
<pre class="brush: text">192.168.56.51 | 2021-11-11 20:40:56.774198
192.168.56.53 | 2021-11-11 20:40:57.797533
192.168.56.51 | 2021-11-11 20:40:58.821054
192.168.56.53 | 2021-11-11 20:40:59.843738
192.168.56.51 | 2021-11-11 20:41:00.86877
192.168.56.53 | 2021-11-11 20:41:01.889666
192.168.56.51 | 2021-11-11 20:41:02.912988
192.168.56.53 | 2021-11-11 20:41:03.933952
192.168.56.51 | 2021-11-11 20:41:05.045196
192.168.56.53 | 2021-11-11 20:41:06.078416
</pre>
No distruption for the read only requests.<br>
<br>
<span style="font-size: large;">2. Stop the PostgreSQL process</span><br>
<br>
<b>Read Write</b><br>
<br>
<pre class="brush: text">
192.168.56.52 | 2021-11-11 20:52:01.251009
psql: error: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql: error: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
192.168.56.52 | 2021-11-11 20:52:08.301596
</pre>
Patroni brought PostgreSQL to the running state. Election was not triggered. There was 7 secs downtime for the writer process.<br>
<br>
<b>Read Only</b><br>
<br>
<pre class="brush: text">
192.168.56.53 | 2021-11-11 20:52:01.53767
192.168.56.51 | 2021-11-11 20:52:02.561452
192.168.56.53 | 2021-11-11 20:52:03.583391
192.168.56.51 | 2021-11-11 20:52:04.609092
192.168.56.53 | 2021-11-11 20:52:05.631433
192.168.56.51 | 2021-11-11 20:52:06.656341
192.168.56.53 | 2021-11-11 20:52:07.677131
192.168.56.51 | 2021-11-11 20:52:08.701682
192.168.56.53 | 2021-11-11 20:52:09.730157
</pre>
<span style="font-size: large;">3. Reboot the server</span><br>
<br>
<b>Read Write</b><br>
<br>
<pre class="brush: text">
192.168.56.52 | 2021-11-11 20:59:31.49515
psql: error: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql: error: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
192.168.56.51 | 2021-11-11 20:59:48.650256 <<-- SERVER 192.168.56.51 ELECTED AS THE NEW MASTER:
192.168.56.51 | 2021-11-11 20:59:49.669785
192.168.56.51 | 2021-11-11 20:59:50.687517
</pre>
Faiover happened and one of the slave servers was elected as the new master. We had 17 seconds downtime for the writer process. On the old master server, Petroni brought PostgreSQL and performed pg_rewind to create replica.<br>
<br>
<pre class="brush: text">
+-----------+---------------+---------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| psql13n51 | 192.168.56.51 | Leader | running | 14 | |
| psql13n52 | 192.168.56.52 | Replica | running | 14 | 0 |
| psql13n53 | 192.168.56.53 | Replica | running | 14 | 0 |
+-----------+---------------+---------+---------+----+-----------+
</pre>
<b>Read Only</b><br>
<br>
<pre class="brush: text">192.168.56.51 | 2021-11-11 20:59:29.053858
192.168.56.53 | 2021-11-11 20:59:30.07594
192.168.56.51 | 2021-11-11 20:59:31.105134
192.168.56.53 | 2021-11-11 20:59:32.123691
192.168.56.51 | 2021-11-11 20:59:33.152343
192.168.56.53 | 2021-11-11 20:59:34.170016
192.168.56.51 | 2021-11-11 20:59:35.199209
192.168.56.53 | 2021-11-11 20:59:36.21726
192.168.56.51 | 2021-11-11 20:59:37.238567
192.168.56.53 | 2021-11-11 20:59:38.251579
192.168.56.51 | 2021-11-11 20:59:39.273968
192.168.56.53 | 2021-11-11 20:59:40.288168
192.168.56.51 | 2021-11-11 20:59:41.308803
192.168.56.53 | 2021-11-11 20:59:42.32304
192.168.56.53 | 2021-11-11 20:59:43.339712
192.168.56.53 | 2021-11-11 20:59:44.357711
192.168.56.53 | 2021-11-11 20:59:45.375188
192.168.56.53 | 2021-11-11 20:59:46.395121
192.168.56.53 | 2021-11-11 20:59:47.411711
192.168.56.53 | 2021-11-11 20:59:48.428075
192.168.56.53 | 2021-11-11 20:59:49.445494
192.168.56.53 | 2021-11-11 20:59:50.462092
</pre>
There was no disrupions for the read only requests.<br>
<br>
<span style="font-size: large;">3. Stop Patroni process</span><br>
<br>
<b>Read Write</b><br>
<br>
<pre class="brush: text">192.168.56.51 | 2021-11-11 21:08:25.526132
psql: error: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql: error: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql: error: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
192.168.56.53 | 2021-11-11 21:08:35.60005
192.168.56.53 | 2021-11-11 21:08:36.62634
192.168.56.53 | 2021-11-11 21:08:37.651523
</pre>
<pre class="brush: text">
+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-----------+---------------+---------+---------+----+-----------+
| psql13n52 | 192.168.56.52 | Replica | running | 15 | 0 |
| psql13n53 | 192.168.56.53 | Leader | running | 15 | |
+-----------+---------------+---------+---------+----+-----------+
</pre>
Patroni stopped PostgreSQL instance and new master node was elected. We had 10 secs of downtime for the writer process.<br>
<br>
After starting Patroni, old master server was rewound using pg_rewind and new replica joined to the cluster.<br>
<br>
<pre class="brush: text">+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-----------+---------------+---------+---------+----+-----------+
| psql13n51 | 192.168.56.51 | Replica | running | 15 | 0 |
| psql13n52 | 192.168.56.52 | Replica | running | 15 | 0 |
| psql13n53 | 192.168.56.53 | Leader | running | 15 | |
+-----------+---------------+---------+---------+----+-----------+
</pre>
<b>Read Only</b><br>
<br>
<pre class="brush: text">192.168.56.53 | 2021-11-11 21:08:25.3516
192.168.56.52 | 2021-11-11 21:08:26.374974
192.168.56.53 | 2021-11-11 21:08:27.397898
192.168.56.52 | 2021-11-11 21:08:28.432293
192.168.56.53 | 2021-11-11 21:08:29.455458
192.168.56.52 | 2021-11-11 21:08:30.479256
192.168.56.53 | 2021-11-11 21:08:31.500499
192.168.56.52 | 2021-11-11 21:08:32.525148
192.168.56.53 | 2021-11-11 21:08:33.54793
192.168.56.52 | 2021-11-11 21:08:34.571675
192.168.56.52 | 2021-11-11 21:08:35.610965
192.168.56.52 | 2021-11-11 21:08:36.639712
</pre>
There was no disruptions for the read only process.<br>
<br>
<br>
<span style="font-size: x-large;">Network Isolation Tests</span><br>
<br>
<span style="font-size: large;">1. Network isolate master server from the configuration</span><br>
<br>
<b>Read Write</b><br>
<br>
<pre class="brush: text">
192.168.56.53 | 2021-11-11 22:15:37.376374 <<-- COMMUNICATION BLOCKED
192.168.56.52 | 2021-11-11 22:15:08.169172
192.168.56.52 | 2021-11-11 22:15:09.190167
192.168.56.52 | 2021-11-11 22:15:10.211688
192.168.56.52 | 2021-11-11 22:15:11.232966
192.168.56.52 | 2021-11-11 22:15:12.254794
192.168.56.52 | 2021-11-11 22:15:13.276149
192.168.56.52 | 2021-11-11 22:15:14.29847
192.168.56.52 | 2021-11-11 22:15:15.319335
192.168.56.52 | 2021-11-11 22:15:16.343936
</pre>
Communication was blocked on the master (read/write) node. New master was elected. We had 31 secs downtime for the writer application.<br>
<br>
<pre class="brush: text">
+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-----------+---------------+---------+---------+----+-----------+
| psql13n51 | 192.168.56.51 | Replica | running | 16 | 0 |
| psql13n52 | 192.168.56.52 | Leader | running | 16 | |
+-----------+---------------+---------+---------+----+-----------+
</pre>
Bringing back communication to old master server did not bring old master server as replica automatically. Restarting Patroni it brought PostgreSQL instance on 192.168.56.53 as replica.<br>
<br>
<pre class="brush: text">+-----------+---------------+---------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| psql13n51 | 192.168.56.51 | Replica | running | 16 | 0 |
| psql13n52 | 192.168.56.52 | Leader | running | 16 | |
| psql13n53 | 192.168.56.53 | Replica | running | 16 | 0 |
+-----------+---------------+---------+---------+----+-----------+
</pre>
<b>Read Only</b><br>
<br>
<pre class="brush: text">192.168.56.51 | 2021-11-11 22:15:11.676438
192.168.56.51 | 2021-11-11 22:15:12.699285
192.168.56.51 | 2021-11-11 22:15:13.722465
192.168.56.51 | 2021-11-11 22:15:14.74705
192.168.56.51 | 2021-11-11 22:15:15.77105
192.168.56.51 | 2021-11-11 22:15:16.794407
192.168.56.51 | 2021-11-11 22:15:17.816547
192.168.56.51 | 2021-11-11 22:15:18.838761
192.168.56.53 | 2021-11-11 22:25:57.360616
192.168.56.51 | 2021-11-11 22:25:58.390982
192.168.56.53 | 2021-11-11 22:25:59.42245
192.168.56.51 | 2021-11-11 22:26:00.450804
192.168.56.53 | 2021-11-11 22:26:01.480687
192.168.56.51 | 2021-11-11 22:26:02.510569
192.168.56.53 | 2021-11-11 22:26:03.540663
192.168.56.51 | 2021-11-11 22:26:04.574112
192.168.56.53 | 2021-11-11 22:26:05.606363
192.168.56.51 | 2021-11-11 22:26:06.635608
</pre>
Afer adding old master server to cluster configuration as replica it started to accept read only requests.<br>
<br>
<span style="font-size: large;">2. Network isolate slave server from the configuration</span><br>
<br>
<b>Read Write</b><br>
<br>
<pre class="brush: text">192.168.56.52 | 2021-11-11 22:28:22.539789
192.168.56.52 | 2021-11-11 22:28:23.559629
192.168.56.52 | 2021-11-11 22:28:24.580749
192.168.56.52 | 2021-11-11 22:28:25.925264
192.168.56.52 | 2021-11-11 22:28:26.946179
192.168.56.52 | 2021-11-11 22:28:27.969459
192.168.56.52 | 2021-11-11 22:28:28.991379
192.168.56.52 | 2021-11-11 22:28:30.013173
192.168.56.52 | 2021-11-11 22:28:31.032617
192.168.56.52 | 2021-11-11 22:28:32.053455
192.168.56.52 | 2021-11-11 22:28:33.074863
192.168.56.52 | 2021-11-11 22:28:34.096192
192.168.56.52 | 2021-11-11 22:28:35.116744
</pre>
There was no problem for the writer applications.<br>
<br>
<b>Read Only</b><br>
<br>
<pre class="brush: text">192.168.56.51 | 2021-11-11 22:28:03.186052
192.168.56.53 | 2021-11-11 22:28:04.208455
192.168.56.51 | 2021-11-11 22:28:05.23119
192.168.56.51 | 2021-11-11 22:28:16.665654
psql: error: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
192.168.56.51 | 2021-11-11 22:28:30.700107
192.168.56.51 | 2021-11-11 22:28:31.721761
192.168.56.51 | 2021-11-11 22:28:32.744021
192.168.56.51 | 2021-11-11 22:28:33.766453
192.168.56.51 | 2021-11-11 22:28:34.789146
192.168.56.51 | 2021-11-11 22:28:35.811602
</pre>
<pre class="brush: text">+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-----------+---------------+---------+---------+----+-----------+
| psql13n51 | 192.168.56.51 | Replica | running | 16 | 0 |
| psql13n52 | 192.168.56.52 | Leader | running | 16 | |
+-----------+---------------+---------+---------+----+-----------+
</pre>
Isolated standby server was excluded from the cluster configuration.<br>
<br>
Bringing back communication to the standby server node rejoined to the cluster automatically.<br>
<br>
<pre class="brush: text">+-----------+---------------+---------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| psql13n51 | 192.168.56.51 | Replica | running | 16 | 0 |
| psql13n52 | 192.168.56.52 | Leader | running | 16 | |
| psql13n53 | 192.168.56.53 | Replica | running | 16 | 0 |
+-----------+---------------+---------+---------+----+-----------+
</pre>
<br>
<span style="font-size: x-large;">Switchover</span> <br>
<br>
Manually trigger switch of the primary node to one of the replicas and bring the old primary as a new replica into the cluster.<br>
<br>
<pre class="brush: text">$ patronictl -c /opt/app/patroni/etc/postgresql.yml list
+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-----------+---------------+---------+---------+----+-----------+
| psql13n51 | 192.168.56.51 | Leader | running | 17 | |
| psql13n52 | 192.168.56.52 | Replica | running | 17 | 0 |
| psql13n53 | 192.168.56.53 | Replica | running | 17 | 0 |
+-----------+---------------+---------+---------+----+-----------+
</pre>
<pre class="brush: text">
$ patronictl -c /opt/app/patroni/etc/postgresql.yml switchover
Master [psql13n51]:
Candidate ['psql13n52', 'psql13n53'] []: psql13n52
When should the switchover take place (e.g. 2021-11-15T22:08 ) [now]:
Current cluster topology
+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-----------+---------------+---------+---------+----+-----------+
| psql13n51 | 192.168.56.51 | Leader | running | 17 | |
| psql13n52 | 192.168.56.52 | Replica | running | 17 | 0 |
| psql13n53 | 192.168.56.53 | Replica | running | 17 | 0 |
+-----------+---------------+---------+---------+----+-----------+
Are you sure you want to switchover cluster postgres, demoting current master psql13n51? [y/N]: y
2021-11-15 21:10:28.05685 Successfully switched over to "psql13n52"
+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-----------+---------------+---------+---------+----+-----------+
| psql13n51 | 192.168.56.51 | Replica | stopped | | unknown |
| psql13n52 | 192.168.56.52 | Leader | running | 17 | |
| psql13n53 | 192.168.56.53 | Replica | running | 17 | 0 |
+-----------+---------------+---------+---------+----+-----------+
</pre>
<pre class="brush: text">192.168.56.51 | 2021-11-15 21:10:21.190417
192.168.56.51 | 2021-11-15 21:10:22.223856
192.168.56.51 | 2021-11-15 21:10:23.259458
192.168.56.51 | 2021-11-15 21:10:24.293523
192.168.56.51 | 2021-11-15 21:10:25.329155
psql: error: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
192.168.56.51 | 2021-11-15 21:10:30.379076
192.168.56.51 | 2021-11-15 21:10:31.40607
192.168.56.52 | 2021-11-15 21:10:32.417283
192.168.56.51 | 2021-11-15 21:10:33.450491
192.168.56.52 | 2021-11-15 21:10:34.468676
192.168.56.52 | 2021-11-15 21:10:35.494665
192.168.56.52 | 2021-11-15 21:10:36.517738
192.168.56.52 | 2021-11-15 21:10:37.541415
192.168.56.52 | 2021-11-15 21:10:38.567083
</pre>
Node 192.168.56.52 bacame new primary node and 192.168.56.51 joined as the new replica to the cluster. Downtime for the read write node was 7 secs.<br>
<br>
<pre class="brush: text">+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-----------+---------------+---------+---------+----+-----------+
| psql13n51 | 192.168.56.51 | Replica | running | 18 | 0 |
| psql13n52 | 192.168.56.52 | Leader | running | 18 | |
| psql13n53 | 192.168.56.53 | Replica | running | 18 | 0 |
+-----------+---------------+---------+---------+----+-----------+
</pre>
<br>
<span style="font-size: x-large;">Failover</span><br>
<br>
Although failover can also be triggered manually it is mostly executed automatically, when leader node is unavailable for unplanned reason. We have noticed automatic failovers in previous tests.<br>
<br>
For a test I will trigger failover manually.<br>
<br>
<pre class="brush: text">patronictl -c /opt/app/patroni/etc/postgresql.yml failover
Candidate ['psql13n51', 'psql13n53'] []: psql13n51
Current cluster topology
+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-----------+---------------+---------+---------+----+-----------+
| psql13n51 | 192.168.56.51 | Replica | running | 18 | 0 |
| psql13n52 | 192.168.56.52 | Leader | running | 18 | |
| psql13n53 | 192.168.56.53 | Replica | running | 18 | 0 |
+-----------+---------------+---------+---------+----+-----------+
Are you sure you want to failover cluster postgres, demoting current master psql13n52? [y/N]: y
2021-11-15 21:25:04.85489 Successfully failed over to "psql13n51"
+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-----------+---------------+---------+---------+----+-----------+
| psql13n51 | 192.168.56.51 | Leader | running | 18 | |
| psql13n52 | 192.168.56.52 | Replica | stopped | | unknown |
| psql13n53 | 192.168.56.53 | Replica | running | 18 | 0 |
+-----------+---------------+---------+---------+----+-----------+
</pre>
Node 192.168.56.51 became new master and node 192.168.56.52 joined to the cluster as a replica. Downtime was 7 secs.<br>
<br>
<pre class="brush: text">+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-----------+---------------+---------+---------+----+-----------+
| psql13n51 | 192.168.56.51 | Leader | running | 19 | |
| psql13n52 | 192.168.56.52 | Replica | running | 19 | 0 |
| psql13n53 | 192.168.56.53 | Replica | running | 19 | 0 |
+-----------+---------------+---------+---------+----+-----------+
</pre>
<br>
<span style="font-size: x-large;">Maintenance mode</span><br>
<br>
Sometimes it is necessary to do maintenance on a single node and you do not want Petroni to manage the cluster. For example when performing PostgreSQL upgrade.<br>
<br>
When Patroni is paused, it won't change the state of the PostgreSQL - it will not to try automatically start PostgreSQL cluster when it is stopped.<br>
<br>
For the test we will stop the replica and test if Petroni will start database automatically as in previous tests.<br>
<br>
<pre class="brush: text">[postgres@psql13n52 ~]$ patronictl -c /opt/app/patroni/etc/postgresql.yml pause
Success: cluster management is paused
</pre>
<pre class="brush: text">
[postgres@psql13n51 ~]$ patronictl -c /opt/app/patroni/etc/postgresql.yml list
+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-----------+---------------+---------+---------+----+-----------+
| psql13n51 | 192.168.56.51 | Leader | running | 19 | |
| psql13n52 | 192.168.56.52 | Replica | running | 19 | 0 |
| psql13n53 | 192.168.56.53 | Replica | running | 19 | 0 |
+-----------+---------------+---------+---------+----+-----------+
Maintenance mode: on
</pre>
Notice - "Maintenance mode: on".<br>
<br>
Replica is stopped:<br>
<br>
<pre class="brush: text">$ pg_ctl -D /var/lib/pgsql/14/data stop
waiting for server to shut down.... done
server stopped
</pre>
Patroni didn't brought up database.<br>
<br>
<pre class="brush: text">$ patronictl -c /opt/app/patroni/etc/postgresql.yml list
+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-----------+---------------+---------+---------+----+-----------+
| psql13n51 | 192.168.56.51 | Leader | running | 19 | |
| psql13n52 | 192.168.56.52 | Replica | stopped | | unknown |
| psql13n53 | 192.168.56.53 | Replica | running | 19 | 0 |
+-----------+---------------+---------+---------+----+-----------+
Maintenance mode: on
</pre>
Resume Patroni.<br>
<br>
<pre class="brush: text">$ patronictl -c /opt/app/patroni/etc/postgresql.yml resume
Success: cluster management is resumed
</pre>
Node joined the cluster after few seconds.<br>
<br>
<pre class="brush: text">$ patronictl -c /opt/app/patroni/etc/postgresql.yml list
+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-----------+---------------+---------+---------+----+-----------+
| psql13n51 | 192.168.56.51 | Leader | running | 19 | |
| psql13n52 | 192.168.56.52 | Replica | running | 19 | 0 |
| psql13n53 | 192.168.56.53 | Replica | running | 19 | 0 |
+-----------+---------------+---------+---------+----+-----------+
</pre>
<br>
<br>
<br>
<br>
<b>References</b>:<br>
<a href="http://highscalability.com/blog/2019/9/16/managing-high-availability-in-postgresql-part-iii-patroni.html">http://highscalability.com/blog/2019/9/16/managing-high-availability-in-postgresql-part-iii-patroni.html</a>
</span>Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com3tag:blogger.com,1999:blog-2530682427657016426.post-66494128161879461812021-11-07T10:56:00.012+01:002021-11-12T13:52:24.438+01:00Deploying PostgreSQL 14.0 for High Availability using Patroni, etcd, HAProxy and keepalived on CetntOS 8<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiKuDz5yvw2rtzj8o9_sT3hIe4cILa534IvqjK7H7fcBHPFmKaRIDm_r_GL9SUKF_bzxJiVlaavVEs2YhvkMbgVv6Yz4bbPPgk60ZCwcwps7Ct5V8nEbZpWF-bq-BRJDbPsSVs9RvvR-IhW/s0/Screenshot+2021-11-07+at+09.45.57.JPG" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" data-original-height="1146" data-original-width="2048" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiKuDz5yvw2rtzj8o9_sT3hIe4cILa534IvqjK7H7fcBHPFmKaRIDm_r_GL9SUKF_bzxJiVlaavVEs2YhvkMbgVv6Yz4bbPPgk60ZCwcwps7Ct5V8nEbZpWF-bq-BRJDbPsSVs9RvvR-IhW/s0/Screenshot+2021-11-07+at+09.45.57.JPG"/></a></div>
</br>
Patroni is an automatic failover system for PostgreSQL. It provides automatic and manual failover and keeps all vital data in distributed configuration store (DCS). The database connections do not hapen directly to the database nodes but are routed via a connection proxy like HAProxy. Proxy determines the active/master node.</br>
</br>
Using proxy for routing connections risk of having split brain scenario is very limited.</br>
</br>
By using Patroni all the dynamic settings are stored into the DCS in order to have complete consistency on the participating nodes.</br>
</br>
In this blog post I will focus on building Patroni cluster on top of CentOS 8 by using etcd in clustering and HAProxy for routing database connections to the primary server.</br>
</br>
</br>
<span id="fullpost">
<span style="font-size: x-large;">OS setup</span></br>
</br>
Firewalld and selinux need to be adjusted before configuring Patroni cluster.</br>
</br>
</br>
<span style="font-size: large;">Firewalld</span></br>
</br>
The ports required for operating patroni/etcd/haproxy/postgresql are the following:</br>
</br>
<b>5432</b> - PostgreSQL standard port, not used by PostgreSQL itself but by HAProxy</br>
<b>5000</b> - PostgreSQL listening port used by HAproxy to route the database connections to write node</br>
<b>5001</b> - PostgreSQL listening port used by HAproxy to route the database connections to read nodes</br>
<b>2380</b> - etcd peer urls port required by the etcd members communication</br>
<b>2379</b> - etcd client port required by any client including patroni to communicate with etcd</br>
<b>8008</b> - patroni rest api port required by HAProxy to check the nodes status</br>
<b>7000</b> - HAProxy port to expose the proxy’s statistics</br>
</br>
<span style="font-size: large;">selinux</span> </br>
</br>
Selinux by default prevents the new services to bind to all the ip addresses.</br>
</br>
In order to allow HAProxy to bind the ports required for its functionality we need to run this command.</br>
</br>
<pre class="brush: text">
sudo setsebool -P haproxy_connect_any=1
</pre>
</br>
<span style="font-size: x-large;">Initial setup</span></br>
</br>
<pre class="brush: text">
$ cat /etc/hosts
192.168.56.51 psql13n51
192.168.56.52 psql13n52
192.168.56.53 psql13n53
</pre>
</br>
<span style="font-size: x-large;">ETCD</span></br>
</br>
Etcd is a fault-tolerant, distributed key-value store used to store the state of the Postgres cluster. Via Patroni, all of the Postgres nodes make use of etcd to keep the Postgres cluster up and running.</br>
</br>
In production, it may be best to use larger etcd cluster so that if one etcd node fails, it doesn't affect other Postgres servers.</br>
</br>
<span style="font-size: large;">Download and Install the etcd Binaries (All nodes)</span> </br>
</br>
Install etcd on all three nodes.</br>
</br>
<pre class="brush: text">
ETCD_VER=v3.5.1
# choose either URL
GOOGLE_URL=https://storage.googleapis.com/etcd
GITHUB_URL=https://github.com/etcd-io/etcd/releases/download
DOWNLOAD_URL=${GOOGLE_URL}
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
rm -rf /tmp/etcd-download-test && mkdir -p /tmp/etcd-download-test
curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /tmp/etcd-download-test --strip-components=1
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
/tmp/etcd-download-test/etcd --version
/tmp/etcd-download-test/etcdctl version
/tmp/etcd-download-test/etcdutl version
</pre>
Move binaries to <i>/usr/local/bin</i> directory.</br>
</br>
<pre class="brush: text">
mv /tmp/etcd-download-test/etcd* /usr/local/bin/
</pre>
Check etcd and etcdctl version.</br>
</br>
<pre class="brush: text">
$ etcd --version
etcd Version: 3.5.1
Git SHA: e8732fb5f
Go Version: go1.16.3
Go OS/Arch: linux/amd64
$ etcdctl version
etcdctl version: 3.5.1
API version: 3.5
</pre>
<span style="font-size: large;">Configure Etcd Systemd service:</span> </br>
</br>
<b>Create etcd directories and user (All nodes)</b></br>
</br>
Create etcd system user:</br>
</br>
<pre class="brush: text">
sudo groupadd --system etcd
sudo useradd -s /sbin/nologin --system -g etcd etcd
</pre>
Set /var/lib/etcd/ directory ownership to etcd user:</br>
</br>
<pre class="brush: text">
sudo mkdir -p /var/lib/etcd/
sudo mkdir /etc/etcd
sudo chown -R etcd:etcd /var/lib/etcd/
sudo chmod -R 700 /var/lib/etcd/
</pre>
<b>Configure the etcd on all nodes.</b></br>
</br>
On each server, save these variables by running the commands below.</br>
</br>
<pre class="brush: text">
INT_NAME="eth1"
#INT_NAME="ens3"
ETCD_HOST_IP=$(ip addr show $INT_NAME | grep "inet\b" | awk '{print $2}' | cut -d/ -f1)
ETCD_NAME=$(hostname -s)
</pre>
Where:</br>
<b>INT_NAME</b> - The name of your network interface to be used for cluster traffic. Change it to match your server configuration.</br>
<b>ETCD_HOST_IP</b> - The internal IP address of the specified network interface. This is used to serve client requests and communicate with etcd cluster peers.</br>
<b>ETCD_NAME</b> – Each etcd member must have a unique name within an etcd cluster. Command used will set the etcd name to match the hostname of the current compute instance.</br>
</br>
Check variables to confirm they have correct values:</br>
</br>
<pre class="brush: text">
echo $INT_NAME
echo $ETCD_HOST_IP
echo $ETCD_NAME
</pre>
Once all variables are set, create the etcd.service systemd unit file:</br>
</br>
Create a systemd service file for etcd. Replace --listen-client-urls with your server IPs.</br>
For ETCD 3.5 default is api v3 but Patroni doesn't currently support v3 API so it is important to set parameter <b>enable-v2=true</b>.</br>
</br>
<pre class="brush: text">
cat << EOF > /lib/systemd/system/etcd.service
[Unit]
Description=etcd service
Documentation=https://github.com/coreos/etcd
[Service]
User=etcd
Type=notify
ExecStart=/usr/local/bin/etcd \\
--name ${ETCD_NAME} \\
--enable-v2=true \\
--data-dir /var/lib/etcd \\
--initial-advertise-peer-urls http://${ETCD_HOST_IP}:2380 \\
--listen-peer-urls http://${ETCD_HOST_IP}:2380 \\
--listen-client-urls http://${ETCD_HOST_IP}:2379,http://127.0.0.1:2379 \\
--advertise-client-urls http://${ETCD_HOST_IP}:2379 \\
--initial-cluster-token etcd-cluster-1 \\
--initial-cluster psql13n51=http://192.168.56.51:2380,psql13n52=http://192.168.56.52:2380,psql13n53=http://192.168.56.53:2380 \\
--initial-cluster-state new \\
--heartbeat-interval 1000 \\
--election-timeout 5000
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
</pre>
For CentOS / RHEL Linux distributions, set SELinux mode to permissive.</br>
</br>
<pre class="brush: text">
sudo setenforce 0
sudo sed -i 's/^SELINUX=.*/SELINUX=permissive/g' /etc/selinux/config
</pre>
If you have active firewall service, allow ports 2379 and 2380.</br>
</br>
<pre class="brush: text">
# RHEL / CentOS / Fedora firewalld
sudo firewall-cmd --add-port={2379,2380}/tcp --permanent
sudo firewall-cmd --reload
# Ubuntu/Debian
sudo ufw allow proto tcp from any to any port 2379,2380
</pre>
<span style="font-size: large;">Bootstrap The etcd Cluster</span> </br>
</br>
Once all the configurations are applied on the three servers, start and enable the newly created etcd service on all the nodes. The first server will act as a bootstrap node. One node will be automatically elected as a leader once the service is started in all the three nodes.</br>
</br>
<pre class="brush: text">
# systemctl daemon-reload
# systemctl enable etcd
# systemctl start etcd.service
# systemctl status -l etcd.service
</pre>
<span style="font-size: large;">Test Etcd Cluster installation</span></br>
</br>
Test your setup by listing the etcd cluster members:</br>
</br>
<pre class="brush: text">
# etcdctl member list
</pre>
Check leader on host:</br>
</br>
<pre class="brush: text">
# etcdctl endpoint status --write-out=table
</pre>
Also check cluster health by running the command:</br>
</br>
<pre class="brush: text">
# etcdctl endpoint health
127.0.0.1:2379 is healthy: successfully committed proposal: took = 4.383594ms
</pre>
Let’s also try writing to etcd.</br>
</br>
<pre class="brush: text">
# etcdctl put /message "Hello World"
</pre>
Read the value of message back – It should work on all nodes.</br>
</br>
<pre class="brush: text">
# etcdctl get /message
Hello World
</pre>
<span style="font-size: x-large;">Watchdog</span></br>
</br>
Watchdog devices will reset the whole system when they do not get a keepalive heartbeat within a specified timeframe. This adds an additional layer of fail safe in case usual Patroni split-brain protection mechanisms fail.
It is recommended to deploy watchdog mechanism in PostgreSQL HA when running configuration in the production.</br>
</br>
Install on all nodes.</br>
</br>
<pre class="brush: text">
yum -y install watchdog
</pre>
</br>
<pre class="brush: text">
/sbin/modprobe softdog
</pre>
Patroni will be the component interacting with the watchdog device. Since Patroni is run by the postgres user, we need to either set the permissions of the watchdog device open enough so the postgres user can write to it or make the device owned by postgres itself, which we consider a safer approach (as it is more restrictive):</br>
</br>
Include the softdog kernel module to load on CentOS boot up.</br>
</br>
It’s better that the softdog module is not loaded via /etc/rc.local but the default CentOS methodology to load module from /etc/rc.module is used:</br>
</br>
<pre class="brush: text">
echo modprobe softdog >> /etc/rc.modules
chmod +x /etc/rc.modules
</pre>
<pre class="brush: text">
sudo sh -c 'echo "KERNEL==\"watchdog\", OWNER=\"postgres\", GROUP=\"postgres\"" >> /etc/udev/rules.d/61-watchdog.rules'
</pre>
Check if module is blacklisted by default and there was a strain file with such a directive still lingering around.</br>
</br>
<pre class="brush: text">
$ grep blacklist /lib/modprobe.d/* /etc/modprobe.d/* |grep softdog
</pre>
Editing that file in each of the nodes to remove the line above and restarting the servers.</br>
</br>
<pre class="brush: text">
$ lsmod | grep softdog
softdog 16384 0
</pre>
</br>
<pre class="brush: text">
[root@localhost ~]# ls -l /dev/watchdog*
crw-------. 1 root root 10, 130 Nov 5 11:13 /dev/watchdog
crw-------. 1 root root 248, 0 Nov 5 11:13 /dev/watchdog0
</pre>
<span style="font-size: x-large;">PostgreSQL</span></br>
</br>
Install PostgreSQL on all nodes.</br>
</br>
By default the postgres module will have an older version of postgres enabled. But the current module does not include postgresql 14. Confirm with this command:</br>
</br>
<pre class="brush: text">
sudo dnf module list postgresql
</pre>
Let us Install the repository RPM using this command:</br>
</br>
<pre class="brush: text">
sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm
</pre>
Then to avoid conflicts, let us disable the built-in PostgreSQL module:</br>
</br>
<pre class="brush: text">
sudo dnf -qy module disable postgresql
</pre>
Finally install PostgreSQL 14 server:</br>
</br>
<pre class="brush: text">
sudo dnf install -y postgresql14-server
</pre>
Let’s also install the Contrib package which provides several additional features for the PostgreSQL database system:</br>
</br>
<pre class="brush: text">
sudo dnf install -y postgresql14-contrib
</pre>
An important concept to understand in a PostgreSQL HA environment like this one is that PostgreSQL should not be started automatically by systemd during the server initialization: we should leave it to Patroni to fully manage it, including the process of starting and stopping the server. Thus, we should disable the service:</br>
</br>
<pre class="brush: text">
sudo systemctl disable postgresql-14
</pre>
Start with a fresh new PostgreSQL setup and let Patroni bootstrap the cluster. Remove the data directory that has been created as part of the PostgreSQL installation:</br>
</br>
<pre class="brush: text">
sudo systemctl stop postgresql-14
sudo rm -fr /var/lib/pgsql/14/data
</pre>
<span style="font-size: x-large;">Patroni</span></br>
</br>
Patroni is a cluster manager used to customize and automate deployment and maintenance of PostgreSQL HA (High Availability) clusters. You should check the latest available release from Github page.</br>
</br>
</br>
Install Patroni and python client for ETCD on all 3 nodes:</br>
</br>
<pre class="brush: text">
# yum install patroni-etcd
# yum install pyhton3-etcd
</pre>
If you have active firewall service, allow 5432 port on all nodes.
```
# RHEL / CentOS / Fedora firewalld
sudo firewall-cmd --add-port=5432/tcp --permanent
sudo firewall-cmd --reload
# Ubuntu/Debian
sudo ufw allow proto tcp from any to any port 5432
```
</br>
<pre class="brush: text">
pip install python-etcd
</pre>
Here’s the configuration file we have used for psql13n51:</br>
</br>
<pre class="brush: text">
cat /opt/app/patroni/etc/postgresql.yml
scope: postgres
name: psql13n51
restapi:
listen: 0.0.0.0:8008
connect_address: 192.168.56.51:8008
etcd:
host: psql13n51:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
wal_level: replica
hot_standby: "on"
logging_collector: "on"
max_wal_senders: 5
max_replication_slots: 5
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 127.0.0.1/32 trust
- host replication replicator 192.168.56.1/24 md5
- host all all 192.168.56.1/24 md5
- host all all 0.0.0.0/0 md5
users:
admin:
password: admin
options:
- createrole
- createdb
postgresql:
listen: 0.0.0.0:5432
connect_address: 192.168.56.51:5432
data_dir: "/var/lib/pgsql/14/data"
bin_dir: "/usr/pgsql-14/bin"
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: vagrant
superuser:
username: postgres
password: vagrant
parameters:
unix_socket_directories: '/var/run/postgresql'
watchdog:
mode: required
device: /dev/watchdog
safety_margin: 5
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
</pre>
Validate configuration:</br>
</br>
<pre class="brush: text">
patroni --validate-config /opt/app/patroni/etc/postgresql.yml
</pre>
Lets bootstrap the cluster using parameter from yml file as postgres user:</br>
</br>
<pre class="brush: text">
# sudo su - postgres
patroni /opt/app/patroni/etc/postgresql.yml
</pre>
<pre class="brush: text">
2021-11-06 07:20:41,692 INFO: postmaster pid=1863
2021-11-06 07:20:41.704 UTC [1863] LOG: redirecting log output to logging collector process
2021-11-06 07:20:41.704 UTC [1863] HINT: Future log output will appear in directory "log".
localhost:5432 - rejecting connections
localhost:5432 - accepting connections
2021-11-06 07:20:41,807 INFO: establishing a new patroni connection to the postgres cluster
2021-11-06 07:20:41,820 INFO: running post_bootstrap
2021-11-06 07:20:41,844 INFO: Software Watchdog activated with 25 second timeout, timing slack 15 seconds
2021-11-06 07:20:41,864 INFO: initialized a new cluster
2021-11-06 07:20:51,859 INFO: no action. I am (psql13n51) the leader with the lock
2021-11-06 07:20:51,880 INFO: no action. I am (psql13n51) the leader with the lock
2021-11-06 07:21:01,883 INFO: no action. I am (psql13n51) the leader with the lock
2021-11-06 07:21:11,877 INFO: no action. I am (psql13n51) the leader with the lock
2021-11-06 07:21:21,878 INFO: no action. I am (psql13n51) the leader with the lock
2021-11-06 07:21:31,877 INFO: no action. I am (psql13n51) the leader with the lock
2021-11-06 07:21:41,880 INFO: no action. I am (psql13n51) the leader with the lock
</pre>
Next, edit postgresql.yml file on psql13n52 node, and add the following configuration parameters.</br>
Make sure, you change namespace, etcd host name, listen and connect_address:</br>
</br>
<pre class="brush: text">
cat /opt/app/patroni/etc/postgresql.yml
scope: postgres
name: psql13n52
restapi:
listen: 0.0.0.0:8008
connect_address: 192.168.56.52:8008
etcd:
host: psql13n52:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
wal_level: replica
hot_standby: "on"
logging_collector: "on"
max_wal_senders: 5
max_replication_slots: 5
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 127.0.0.1/32 trust
- host replication replicator 192.168.56.1/24 md5
- host all all 192.168.56.1/24 md5
- host all all 0.0.0.0/0 md5
users:
admin:
password: admin
options:
- createrole
- createdb
postgresql:
listen: 0.0.0.0:5432
connect_address: 192.168.56.52:5432
data_dir: "/var/lib/pgsql/14/data"
bin_dir: "/usr/pgsql-14/bin"
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: vagrant
superuser:
username: postgres
password: vagrant
parameters:
unix_socket_directories: '/var/run/postgresql'
watchdog:
mode: required
device: /dev/watchdog
safety_margin: 5
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
</pre>
Validate configuration:</br>
</br>
<pre class="brush: text">
patroni --validate-config /opt/app/patroni/etc/postgresql.yml
</pre>
Run as postgres user on psql13n52 node:</br>
</br>
<pre class="brush: text">
# sudo su - postgres
patroni /opt/app/patroni/etc/postgresql.yml
</pre>
<pre class="brush: text">
2021-11-06 07:23:25,827 INFO: Selected new etcd server http://192.168.56.53:2379
2021-11-06 07:23:25,831 INFO: No PostgreSQL configuration items changed, nothing to reload.
2021-11-06 07:23:25,839 INFO: Lock owner: psql13n51; I am psql13n52
2021-11-06 07:23:25,843 INFO: trying to bootstrap from leader 'psql13n51'
2021-11-06 07:23:26,237 INFO: replica has been created using basebackup
2021-11-06 07:23:26,238 INFO: bootstrapped from leader 'psql13n51'
2021-11-06 07:23:26,398 INFO: postmaster pid=1506
localhost:5432 - no response
2021-11-06 07:23:26.435 UTC [1506] LOG: redirecting log output to logging collector process
2021-11-06 07:23:26.435 UTC [1506] HINT: Future log output will appear in directory "log".
localhost:5432 - accepting connections
localhost:5432 - accepting connections
2021-11-06 07:23:27,449 INFO: Lock owner: psql13n51; I am psql13n52
2021-11-06 07:23:27,449 INFO: establishing a new patroni connection to the postgres cluster
2021-11-06 07:23:27,478 INFO: no action. I am a secondary (psql13n52) and following a leader (psql13n51)
2021-11-06 07:23:31,874 INFO: no action. I am a secondary (psql13n52) and following a leader (psql13n51)
2021-11-06 07:23:41,879 INFO: no action. I am a secondary (psql13n52) and following a leader (psql13n51)
</pre>
Next, edit postgresql.yml file on psql13n53:</br>
</br>
<pre class="brush: text">
cat /opt/app/patroni/etc/postgresql.yml
scope: postgres
name: psql13n53
restapi:
listen: 0.0.0.0:8008
connect_address: 192.168.56.53:8008
etcd:
host: psql13n53:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
wal_level: replica
hot_standby: "on"
logging_collector: "on"
max_wal_senders: 5
max_replication_slots: 5
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 127.0.0.1/32 trust
- host replication replicator 192.168.56.1/24 md5
- host all all 192.168.56.1/24 md5
- host all all 0.0.0.0/0 md5
users:
admin:
password: admin
options:
- createrole
- createdb
postgresql:
listen: 0.0.0.0:5432
connect_address: 192.168.56.53:5432
data_dir: "/var/lib/pgsql/14/data"
bin_dir: "/usr/pgsql-14/bin"
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: vagrant
superuser:
username: postgres
password: vagrant
parameters:
unix_socket_directories: '/var/run/postgresql'
watchdog:
mode: required
device: /dev/watchdog
safety_margin: 5
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
</pre>
Validate configuration:</br>
</br>
<pre class="brush: text">
patroni --validate-config /opt/app/patroni/etc/postgresql.yml
</pre>
Run as postgres user:</br>
</br>
<pre class="brush: text">
# sudo su - postgres
patroni /opt/app/patroni/etc/postgresql.yml
</pre>
<pre class="brush: text">
2021-11-06 07:25:26,664 INFO: Selected new etcd server http://192.168.56.53:2379
2021-11-06 07:25:26,667 INFO: No PostgreSQL configuration items changed, nothing to reload.
2021-11-06 07:25:26,673 INFO: Lock owner: psql13n51; I am psql13n53
2021-11-06 07:25:26,676 INFO: trying to bootstrap from leader 'psql13n51'
2021-11-06 07:25:27,102 INFO: replica has been created using basebackup
2021-11-06 07:25:27,102 INFO: bootstrapped from leader 'psql13n51'
2021-11-06 07:25:27,262 INFO: postmaster pid=1597
localhost:5432 - no response
2021-11-06 07:25:27.299 UTC [1597] LOG: redirecting log output to logging collector process
2021-11-06 07:25:27.299 UTC [1597] HINT: Future log output will appear in directory "log".
localhost:5432 - accepting connections
localhost:5432 - accepting connections
2021-11-06 07:25:28,312 INFO: Lock owner: psql13n51; I am psql13n53
2021-11-06 07:25:28,313 INFO: establishing a new patroni connection to the postgres cluster
2021-11-06 07:25:28,340 INFO: no action. I am a secondary (psql13n53) and following a leader (psql13n51)
2021-11-06 07:25:31,877 INFO: no action. I am a secondary (psql13n53) and following a leader (psql13n51)
</pre>
Check the state of the Patroni cluster:</br>
</br>
<pre class="brush: text">
# patronictl -c /opt/app/patroni/etc/postgresql.yml list
+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-----------+---------------+---------+---------+----+-----------+
| psql13n51 | 192.168.56.51 | Leader | running | 1 | |
| psql13n52 | 192.168.56.52 | Replica | running | 1 | 0 |
| psql13n53 | 192.168.56.53 | Replica | running | 1 | 0 |
+-----------+---------------+---------+---------+----+-----------+
</pre>
Psql13n51 started the Patroni cluster so it was automatically made the leader – and thus the primary/master PostgreSQL server. Nodes psql13n52 and psql13n53 are configured as read replicas (as the hot_standby option was enabled in Patroni’s configuration file).</br>
</br>
Check PostgreSQL configuration parameters:</br>
</br>
<pre class="brush: text">
patronictl -c /opt/app/patroni/etc/postgresql.yml show-config postgres
loop_wait: 10
maximum_lag_on_failover: 1048576
postgresql:
parameters:
hot_standby: 'on'
logging_collector: 'on'
max_replication_slots: 5
max_wal_senders: 5
wal_level: replica
use_pg_rewind: true
use_slots: true
retry_timeout: 10
ttl: 30
</pre>
With the configuration file in place, and now that we already have the etcd cluster up, all that is required is to restart the Patroni service:</br>
</br>
Configure Patroni service on every node:</br>
</br>
<pre class="brush: text">
# vi /etc/systemd/system/patroni.service
[Unit]
Description=Runners to orchestrate a high-availability PostgreSQL
After=syslog.target network.target etcd.target
[Service]
Type=simple
User=postgres
Group=postgres
ExecStart=/usr/bin/patroni /opt/app/patroni/etc/postgresql.yml
KillMode=process
TimeoutSec=30
Restart=no
[Install]
WantedBy=multi-user.target
</pre>
<pre class="brush: text">
# systemctl status patroni
# systemctl start patroni
# systemctl enable patroni
# systemctl status etcd
# systemctl enable etcd
</pre>
Reboot all 3 nodes:</br>
</br>
<pre class="brush: text">
# reboot
</pre>
Check status of the Patroni service after reboot. Service should be up and running.</br>
</br>
<pre class="brush: text">
# systemctl status patroni
● patroni.service - Runners to orchestrate a high-availability PostgreSQL
Loaded: loaded (/etc/systemd/system/patroni.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2021-11-05 23:33:22 UTC; 8h ago
Main PID: 705 (patroni)
Tasks: 14 (limit: 11401)
Memory: 146.1M
CGroup: /system.slice/patroni.service
├─ 705 /usr/bin/python3 /usr/bin/patroni /opt/app/patroni/etc/postgresql.yml
├─1278 /usr/pgsql-14/bin/postgres -D /var/lib/pgsql/14/data --config-file=/var/lib/pgsql/14/data/postgresql.conf --listen_addresses=0.0.0.0 --port=5432 --cluster_name=postgres --wal_level=replica --hot_standby=on --max_conn>
├─1280 postgres: postgres: logger
├─1282 postgres: postgres: checkpointer
├─1283 postgres: postgres: background writer
├─1284 postgres: postgres: stats collector
├─1359 postgres: postgres: postgres postgres 127.0.0.1(37558) idle
├─1371 postgres: postgres: walwriter
├─1372 postgres: postgres: autovacuum launcher
└─1373 postgres: postgres: logical replication launcher
Nov 06 07:43:00 psql13n51 patroni[705]: 2021-11-06 07:43:00,048 INFO: Software Watchdog activated with 25 second timeout, timing slack 15 seconds
Nov 06 07:43:00 psql13n51 patroni[705]: 2021-11-06 07:43:00,058 INFO: promoted self to leader by acquiring session lock
Nov 06 07:43:00 psql13n51 patroni[705]: server promoting
Nov 06 07:43:00 psql13n51 patroni[705]: 2021-11-06 07:43:00,064 INFO: cleared rewind state after becoming the leade
</pre>
Check again state of the Petroni cluster:</br>
</br>
<pre class="brush: text">
# patronictl -c /opt/app/patroni/etc/postgresql.yml list
+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-----------+---------------+---------+---------+----+-----------+
| psql13n51 | 192.168.56.51 | Leader | running | 5 | |
| psql13n52 | 192.168.56.52 | Replica | running | 5 | 0 |
| psql13n53 | 192.168.56.53 | Replica | running | 5 | 0 |
+-----------+---------------+---------+---------+----+-----------+
</pre>
<pre class="brush: text">
$ sudo systemctl enable patroni
$ sudo systemctl start patroni
$ sudo systemctl status patroni
</pre>
Postgresql is prohibited from the auto start, because it is managed by postgresql patroni.</br>
</br>
<span style="font-size: x-large;">Keepalived</span></br>
</br>
Keepalived is used for IP failover between more servers.</br>
Download latest Keepalived installation <a href="https://www.keepalived.org/download.html">https://www.keepalived.org/download.html</a>.</br>
</br>
Run this on all 3 nodes:</br>
</br>
<pre class="brush: text">
wget https://www.keepalived.org/software/keepalived-2.2.4.tar.gz
</pre>
Unpack archive and configure:</br>
</br>
<pre class="brush: text">
# tar xvfz keepalived-2.2.4.tar.gz
# cd keepalived-2.2.4
# ./configure
</pre>
Fix errors befure running make command.</br>
</br>
In my case I need to install openssl and libnl3 packages.</br>
</br>
<pre class="brush: text">
configure: error:
!!! OpenSSL is not properly installed on your system. !!!
!!! Can not include OpenSSL headers files. !!!
</pre>
</br>
<pre class="brush: text">
yum -y install openssl openssl-devel
</pre>
</br>
<pre class="brush: text">
# ./configure
*** WARNING - this build will not support IPVS with IPv6. Please install libnl/libnl-3 dev libraries to support IPv6 with IPVS.
</pre>
</br>
<pre class="brush: text">
yum -y install libnl3 libnl3-devel
</pre>
</br>
<pre class="brush: text">
# ./configure
</pre>
When there is no error then run make and make install:</br>
</br>
<pre class="brush: text">
# make && make install
</pre>
For Keepalived startup the same service is created on all servers.</br>
</br>
<pre class="brush: text">
# cat /etc/systemd/system/keepalived.service
[Unit]
Description=LVS and VRRP High Availability Monitor
After=network-online.target syslog.target
Wants=network-online.target
[Service]
Type=forking
PIDFile=/run/keepalived.pid
KillMode=process
EnvironmentFile=-/usr/local/etc/sysconfig/keepalived
ExecStart=/usr/local/sbin/keepalived $KEEPALIVED_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
[Install]
WantedBy=multi-user.target
</pre>
Before starting keepalived, we have to configure a configuration file on all servers.</br>
</br>
psql13n51:</br>
</br>
<pre class="brush: text">
# cat /etc/keepalived/keepalived.conf
global_defs {
}
vrrp_script chk_haproxy { # Requires keepalived-1.1.13
script "killall -0 haproxy" # widely used idiom
interval 2 # check every 2 seconds
weight 2 # add 2 points of prio if OK
}
vrrp_instance VI_1 {
interface eth0
state MASTER # or "BACKUP" on backup
priority 101 # 101 on master, 100 on backup
virtual_router_id 51
authentication {
auth_type PASS
auth_pass 1234
}
virtual_ipaddress {
192.168.56.100
}
track_script {
chk_haproxy
}
}
</pre>
psql13n52 & psql13n53:</br>
</br>
<pre class="brush: text">
# cat /etc/keepalived/keepalived.conf
global_defs {
}
vrrp_script chk_haproxy { # Requires keepalived-1.1.13
script "killall -0 haproxy" # widely used idiom
interval 2 # check every 2 seconds
weight 2 # add 2 points of prio if OK
}
vrrp_instance VI_1 {
interface eth0
state BACKUP # or "BACKUP" on backup
priority 100 # 101 on master, 100 on backup
virtual_router_id 51
authentication {
auth_type PASS
auth_pass 1234
}
virtual_ipaddress {
192.168.56.100
}
track_script {
chk_haproxy
}
}
</pre>
Now start keepalived service on all servers</br>
</br>
<pre class="brush: text">
# systemctl start keepalived
# systemctl status keepalived
# systemctl enable keepalived
</pre>
The VIP <b>192.168.56.100</b> should be run on one server and will automatically failover to the the second server if there is any issue with the server.</br>
</br>
<span style="font-size: x-large;">HAProxy</span></br>
</br>
Instead of connecting directly to the database server, the application will be connecting to the proxy instead, which will forward the request to PostgreSQL. When HAproxy is used for this, it is also possible to route read requests to one or more replicas, for load balancing. With HAproxy, this is done by providing two different ports for the application to connect. We opted for the following setup:</br>
</br>
Writes → 5000</br>
Reads → 5001</br>
</br>
HAproxy light-weight service and it can be installed as an independent server or, as in our case, on the database server.</br>
</br>
<pre class="brush: text">
sudo yum -y install haproxy
</pre>
Set configuration on all nodes:</br>
</br>
<pre class="brush: text">
$ cat /etc/haproxy/haproxy.cfg
global
maxconn 100
defaults
log global
mode tcp
retries 2
timeout client 30m
timeout connect 4s
timeout server 30m
timeout check 5s
listen stats
mode http
bind *:7000
stats enable
stats uri /
listen primary
bind *:5000
option httpchk OPTIONS /master
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server psql13n51 psql13n51:5432 maxconn 100 check port 8008
server psql13n52 psql13n52:5432 maxconn 100 check port 8008
server psql13n53 psql13n53:5432 maxconn 100 check port 8008
listen standbys
balance roundrobin
bind *:5001
option httpchk OPTIONS /replica
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server psql13n51 psql13n51:5432 maxconn 100 check port 8008
server psql13n52 psql13n52:5432 maxconn 100 check port 8008
server psql13n53 psql13n53:5432 maxconn 100 check port 8008
</pre>
Note there are two sections: primary, using port 5000, and standbys, using port 5001. All three nodes are included in both sections: that’s because they are all potential candidates to be either primary or secondary. </br>
</br>
For HAproxy to know which role each node currently has, it will send an HTTP request to port 8008 of the node: Patroni will answer. Patroni provides a built-in REST API support for health check monitoring that integrates perfectly with HAproxy for this:</br>
</br>
<pre class="brush: text">
$ curl -s http://psql13n51:8008
{"state": "running", "postmaster_start_time": "2021-11-06 15:01:56.197081+00:00", "role": "replica", "server_version": 140000, "cluster_unlocked": false, "xlog": {"received_location": 83888920, "replayed_location": 83888920, "replayed_timestamp": null, "paused": false}, "timeline": 6, "database_system_identifier": "7027353509639501631", "patroni": {"version": "2.1.1", "scope": "postgres"}}
</pre>
Notice how we received <b>"role": "replica"</b> from Patroni when we sent HTTP request to standby server. HAProxy uses this information for query routing.</br>
</br>
We configured the standbys group to balance read-requests in a round-robin fashion, so each connection request (or reconnection) will alternate between the available replicas.</br>
</br>
Let’s start HAProxy on all three nodes:</br>
</br>
<pre class="brush: text">
# systemctl enable haproxy.service
# systemctl start haproxy.service
# systemctl status haproxy.service
</pre>
Test connections for master or standby connections:</br>
</br>
<pre class="brush: text">
sudo su - postgres
echo "localhost:5000:postgres:postgres:vagrant" > ~/.pgpass
echo "localhost:5001:postgres:postgres:vagrant" >> ~/.pgpass
chmod 0600 ~/.pgpass
</pre>
We can then execute two read-requests to verify the round-robin mechanism is working as intended:</br>
</br>
<pre class="brush: text">
$ psql -Upostgres -hlocalhost -p5001 -t -c "select inet_server_addr()"
192.168.56.51
$ psql -Upostgres -hlocalhost -p5001 -t -c "select inet_server_addr()"
192.168.56.52
</pre>
Master (read/write) connection:</br>
</br>
<pre class="brush: text">
$ psql -Upostgres -hlocalhost -p5000 -t -c "select inet_server_addr()"
192.168.56.53
</pre>
You can also check the state of HAproxy by visiting <a href="http://192.168.56.51:7000/">http://192.168.56.51:7000/</a> on your browser.</br>
</br>
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgBKxrH2IdhrVPsS3tGmQzQXZERjqrEXXNJDkFEuZsjXtH_yuBGy-h_5FAg_zvQiZEannW1XaCr7yIcNG092SASU8FfPCn0C-JP-CRjjyU56EuFnP84oL3KBNoZvr03Rp_i0q-BeXwQXU27/s0/Screenshot+2021-11-07+at+10.28.21.JPG" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" data-original-height="714" data-original-width="1426" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgBKxrH2IdhrVPsS3tGmQzQXZERjqrEXXNJDkFEuZsjXtH_yuBGy-h_5FAg_zvQiZEannW1XaCr7yIcNG092SASU8FfPCn0C-JP-CRjjyU56EuFnP84oL3KBNoZvr03Rp_i0q-BeXwQXU27/s0/Screenshot+2021-11-07+at+10.28.21.JPG"/></a></div>
<pre class="brush: text">
# patronictl -c /opt/app/patroni/etc/postgresql.yml list
+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-----------+---------------+---------+---------+----+-----------+
| psql13n51 | 192.168.56.51 | Replica | running | 7 | 0 |
| psql13n52 | 192.168.56.52 | Replica | running | 7 | 0 |
| psql13n53 | 192.168.56.53 | Leader | running | 7 | |
+-----------+---------------+---------+---------+----+-----------+
</pre>
</br>
We have set up three node Patroni cluster without no single point of failure (SPOF). In next part we will test disaster and failover scenarios on this configuration with active read/write workload.</br>
</br>
</br>
<b>Reference</b>:</br>
<a href="https://github.com/zalando/patroni">https://github.com/zalando/patroni</a></br>
<a href="https://patroni.readthedocs.io/en/latest/https://patroni.readthedocs.io/en/latest/">https://patroni.readthedocs.io/en/latest/</a></br>
<a href="https://www.percona.com/blog/2021/06/11/postgresql-ha-with-patroni-your-turn-to-test-failure-scenarios/">https://www.percona.com/blog/2021/06/11/postgresql-ha-with-patroni-your-turn-to-test-failure-scenarios/</a></br>
<a href="https://blog.dbi-services.com/postgresql-high-availabilty-patroni-ectd-haproxy-keepalived/">https://blog.dbi-services.com/postgresql-high-availabilty-patroni-ectd-haproxy-keepalived/</a></br>
<a href="https://digitalis.io/blog/technology/part1-postgresql-ha-patroni-etcd-haproxy/">https://digitalis.io/blog/technology/part1-postgresql-ha-patroni-etcd-haproxy/</a></br>
</br>
</br>
</span>Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com8tag:blogger.com,1999:blog-2530682427657016426.post-15047267011344357482020-11-11T13:21:00.003+01:002020-11-11T14:31:54.172+01:00ProxySQL - Throttle for MySQL queriesProxySQL is a great high availability and load balancing solution and it is mostly used for such purposes.
But ProxySQL offers much more.<br />
<br />
One of the nice features is the throttling mechanism for queries to the backends.<br />
<br />
Imagine you have a very active system and applications are executing queries at very high rate, which is not so unusual nowadays.
If just one of the queries slows down you could easily end up with many active sessions running the same query.
Just one problematic query could cause high resource usage and general slowness.<br />
<br />
Usually, DBA is called but DBA cannot modify a query, disable problematic application, or change database model without detailed analysis.<br />
<br />
But ProxySQL could help.<br />
Using ProxySQL we could delay execution of the problematic queries.<br />
Yes, specific application request would still have a problem, but we would avoid <b>general</b> problem/downtime and "buy" some time for the fix.<br />
<br />
<br />
Let's simulate such a situation in the test environment.<br />
<br />
<span id="fullpost">
Run benchmark test using sysbench.<br />
<br />
<pre class="brush: plain">
NUM_THREADS=1
TEST_DIR=/usr/share/sysbench/tests/include/oltp_legacy
sysbench \
--test=${TEST_DIR}/oltp_simple.lua \
--oltp-table-size=2000000 \
--time=300 \
--max-requests=0 \
--mysql-table-engine=InnoDB \
--mysql-user=sbtest \
--mysql-password=sbtest \
--mysql-port=3307 \
--mysql-host=192.168.56.25 \
--mysql-engine-trx=yes \
--num-threads=$NUM_THREADS \
prepare
sysbench \
--test=${TEST_DIR}/oltp_simple.lua \
--oltp-table-size=2000000 \
--time=180 \
--max-requests=0 \
--mysql-table-engine=InnoDB \
--mysql-user=sbtest \
--mysql-password=sbtest \
--mysql-port=3307 \
--mysql-host=192.168.56.25 \
--mysql-engine-trx=yes \
--num-threads=$NUM_THREADS \
run
</pre>
<br />
Enable throttling mechanism and delay execution for all queries globally setting "<i>mysql-default_query_delay=100</i>".<br />
<br />
<pre class="brush: plain">
ProxySQLServer> set mysql-default_query_delay=100;
ProxySQLServer> LOAD MYSQL VARIABLES TO RUNTIME;
ProxySQLServer> SAVE MYSQL VARIABLES TO DISK;
</pre>
<br />
Run test again and Check latency(ms).<br />
<br />
<pre class="brush: plain">
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time
Initializing worker threads...
Threads started!
SQL statistics:
queries performed:
read: 1774
write: 0
other: 0
total: 1774
transactions: 1774 (9.85 per sec.)
queries: 1774 (9.85 per sec.)
ignored errors: 0 (0.00 per sec.)
reconnects: 0 (0.00 per sec.)
General statistics:
total time: 180.0942s
total number of events: 1774
Latency (ms):
min: 100.76 <<<<<<<<<<<<<<<<<
avg: 101.51 <<<<< Throttling
max: 129.17 <<<<<<<<<<<<<<<<<
95th percentile: 102.97
sum: 180083.66
Threads fairness:
events (avg/stddev): 1774.0000/0.00
execution time (avg/stddev): 180.0837/0.00
</pre>
<br />
Disable throttling and reset ProxySQL counters.<br />
<br />
<pre class="brush: plain">
ProxySQLServer> set mysql-default_query_delay=0;
ProxySQLServer> LOAD MYSQL VARIABLES TO RUNTIME; SAVE MYSQL VARIABLES TO DISK;
ProxySQLServer> select * from stats_mysql_query_digest_reset;
</pre>
<br />
Check latency(ms).<br />
<br />
<pre class="brush: plain">
Initializing worker threads...
Threads started!
SQL statistics:
queries performed:
read: 641413
write: 0
other: 0
total: 641413
transactions: 641413 (3563.38 per sec.)
queries: 641413 (3563.38 per sec.)
ignored errors: 0 (0.00 per sec.)
reconnects: 0 (0.00 per sec.)
General statistics:
total time: 180.0004s
total number of events: 641413
Latency (ms):
min: 0.19
avg: 0.28
max: 44.45
95th percentile: 0.43
sum: 179252.76
Threads fairness:
events (avg/stddev): 641413.0000/0.00
execution time (avg/stddev): 179.2528/0.00
</pre>
<br />
Enable throttling for just a specific query using ProxySQL mysql query rules.<br />
<br />
<pre class="brush: sql">
-- Find problematic query
ProxySQLServer> select hostgroup,username,count_star,
(count_star/(select Variable_Value from stats_mysql_global where Variable_Name='ProxySQL_Uptime'))
as avg_per_sec, digest, digest_text from stats_mysql_query_digest order by count_star desc limit 10;
+-----------+----------+------------+-------------+--------------------+----------------------------------+
| hostgroup | username | count_star | avg_per_sec | digest | digest_text |
+-----------+----------+------------+-------------+--------------------+----------------------------------+
| 2 | sbtest | 641413 | 78 | 0xBF001A0C13781C1D | SELECT c FROM sbtest1 WHERE id=? |
+-----------+----------+------------+-------------+--------------------+----------------------------------+
1 row in set (0.00 sec)
-- Reset counters
ProxySQLServer> select * from stats_mysql_query_digest_reset;
+-----------+------------+----------+----------------+--------------------+----------------------------------+------------+------------+------------+-----------+----------+----------+-------------------+---------------+
| hostgroup | schemaname | username | client_address | digest | digest_text | count_star | first_seen | last_seen | sum_time | min_time | max_time | sum_rows_affected | sum_rows_sent |
+-----------+------------+----------+----------------+--------------------+----------------------------------+------------+------------+------------+-----------+----------+----------+-------------------+---------------+
| 2 | sbtest | sbtest | | 0xBF001A0C13781C1D | SELECT c FROM sbtest1 WHERE id=? | 641413 | 1601934890 | 1601935070 | 153023170 | 159 | 44349 | 0 | 214399 |
+-----------+------------+----------+----------------+--------------------+----------------------------------+------------+------------+------------+-----------+----------+----------+-------------------+---------------+
1 row in set (0.00 sec)
</pre>
<br />
Insert mysql query rule and enable throttling just for a specific query.<br />
<br />
<pre class="brush: plain">
ProxySQLServer> insert into mysql_query_rules(rule_id,active,digest,delay,apply) values (1,1,'0xBF001A0C13781C1D',100,1);
Query OK, 1 row affected (0.00 sec)
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;
</pre>
<br />
Compare "min_time" between executions.<br />
<br />
<pre class="brush: plain">
Initializing worker threads...
Threads started!
SQL statistics:
queries performed:
read: 1773
write: 0
other: 0
total: 1773
transactions: 1773 (9.85 per sec.)
queries: 1773 (9.85 per sec.)
ignored errors: 0 (0.00 per sec.)
reconnects: 0 (0.00 per sec.)
General statistics:
total time: 180.0325s
total number of events: 1773
Latency (ms):
min: 100.78 <<<<<<<<<<<<<<<<<
avg: 101.53 <<<<< Throttling
max: 104.77 <<<<<<<<<<<<<<<<<
95th percentile: 102.97
sum: 180021.34
Threads fairness:
events (avg/stddev): 1773.0000/0.00
execution time (avg/stddev): 180.0213/0.00
</pre>
<br />
<pre class="brush: sql">
ProxySQLServer> select * from stats_mysql_query_digest_reset;
+-----------+------------+----------+----------------+--------------------+----------------------------------+------------+------------+------------+-----------+----------+----------+-------------------+---------------+
| hostgroup | schemaname | username | client_address | digest | digest_text | count_star | first_seen | last_seen | sum_time | min_time | max_time | sum_rows_affected | sum_rows_sent |
+-----------+------------+----------+----------------+--------------------+----------------------------------+------------+------------+------------+-----------+----------+----------+-------------------+---------------+
| 2 | sbtest | sbtest | | 0xBF001A0C13781C1D | SELECT c FROM sbtest1 WHERE id=? | 1773 | 1601935408 | 1601935588 | 179697522 | 100681 | 104195 | 0 | 594 |
+-----------+------------+----------+----------------+--------------------+----------------------------------+------------+------------+------------+-----------+----------+----------+-------------------+---------------+
1 row in set (0.01 sec)
</pre>
</span>Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com0tag:blogger.com,1999:blog-2530682427657016426.post-62277863706845646962018-01-24T10:45:00.003+01:002020-11-10T07:47:45.216+01:00Galera Cluster Schema Changes, Row Based Replication and Data InconsistencyGalera Cluster is a virtually synchronous multi-master replication plug-in. When using Galera Cluster application can write to any node and transactions are then applied to all serves via row-based replication events.<br />
<br />
This is built-in Mysql row-based replication which supports replication with differing table definitions between Master and Slave.<br />
So, when using row-based repplication source and target table do not have to be identical. A table on master can have more or fewer columns or use different data types.<br />
<br />
But there are limitations you must watch over depending on MySQL version you are running.<br />
- The database and table names must be the same on both Master and Slave<br />
- Columns must be in the same order before any additional column<br />
- Each extra column must have default value<br />
- ...<br />
<br />
Newer MySQL versions may tolerate more differences between source and target table - check documentation for your version.<br />
<br />
<br />
I want to show you what could happen with your data if you do not pay attention on this limitations.<br />
<br />
<span id="fullpost">
<br />
Suppose I have 3-node MariaDB Galera Cluster with table t.<br />
I want to add several columns to the table while database is used by an application.<br />
<br />
For such task I will use built-in Rolling Schema Change (RSU) method which enables me to perform schema changes on node without impact on rest of the cluster.<br />
<br />
Add column c4 to the table t following rules above for row-based replication.<br />
<br />
Table t has three columns and one row inserted.<br />
<pre class="brush: sql">
NODE1
MariaDB [testdb]> create table t (c1 varchar(10), c2 varchar(10), c3 varchar(10));
Query OK, 0 rows affected (0.37 sec)
MariaDB [testdb]> insert into t values ('n1-1','n1-1','n1-1');
Query OK, 1 row affected (0.00 sec)
NODE2
MariaDB [testdb]> select * from t;
+------+------+------+
| c1 | c2 | c3 |
+------+------+------+
| n1-1 | n1-1 | n1-1 |
+------+------+------+
1 row in set (0.00 sec)
NODE3
MariaDB [testdb]> select * from t;
+------+------+------+
| c1 | c2 | c3 |
+------+------+------+
| n1-1 | n1-1 | n1-1 |
+------+------+------+
1 row in set (0.01 sec)
</pre>
<br />
I will enable RSU mode which ensures that this server will not impact the rest of the cluster during ALTER command execution.<br />
<br />
Add column c4 and INSERT row simulating application activity.<br />
<pre class="brush: sql">
MariaDB [testdb]> set session wsrep_OSU_method='RSU';
Query OK, 0 rows affected (0.00 sec)
MariaDB [testdb]> alter table t add column c4 varchar(10);
Query OK, 0 rows affected (0.03 sec)
Records: 0 Duplicates: 0 Warnings: 0
MariaDB [testdb]> set session wsrep_OSU_method='TOI';
Query OK, 0 rows affected (0.00 sec)
MariaDB [testdb]> insert into t(c1,c2,c3) values ('n1-1','n1-1','n1-1');
Query OK, 1 row affected (0.13 sec)
</pre>
<br />
While table definition is different between Node1 and rest of the cluster INSERT few more rows on other nodes.<br />
<br />
<pre class="brush: sql">
NODE2
insert into t(c1,c2,c3) values ('n2-1','n2-1','n2-1');
NODE3
insert into t(c1,c2,c3) values ('n3-1','n3-1','n3-1');
</pre>
<br />
Check rows from table t.<br />
<pre class="brush: sql">
NODE1
MariaDB [testdb]> select * from t;
+------+------+------+------+
| c1 | c2 | c3 | c4 |
+------+------+------+------+
| n1-1 | n1-1 | n1-1 | NULL |
| n1-1 | n1-1 | n1-1 | NULL |
| n2-1 | n2-1 | n2-1 | NULL |
| n3-1 | n3-1 | n3-1 | NULL |
+------+------+------+------+
4 rows in set (0.00 sec)
NODE2
MariaDB [testdb]> select * from t;
+------+------+------+
| c1 | c2 | c3 |
+------+------+------+
| n1-1 | n1-1 | n1-1 |
| n1-1 | n1-1 | n1-1 |
| n2-1 | n2-1 | n2-1 |
| n3-1 | n3-1 | n3-1 |
+------+------+------+
4 rows in set (0.00 sec)
NODE3
MariaDB [testdb]> select * from t;
+------+------+------+
| c1 | c2 | c3 |
+------+------+------+
| n1-1 | n1-1 | n1-1 |
| n1-1 | n1-1 | n1-1 |
| n2-1 | n2-1 | n2-1 |
| n3-1 | n3-1 | n3-1 |
+------+------+------+
4 rows in set (0.01 sec)
</pre>
<br />
As you can notice everything is OK with my data.<br />
<br />
Add new column to Node2 and Node3 following the same steps as for Node1.<br />
<br />
<pre class="brush: sql">
NODE2
MariaDB [testdb]> set session wsrep_OSU_method='RSU';
Query OK, 0 rows affected (0.00 sec)
MariaDB [testdb]> alter table t add column c4 varchar(10);
Query OK, 0 rows affected (0.03 sec)
Records: 0 Duplicates: 0 Warnings: 0
MariaDB [testdb]> set session wsrep_OSU_method='TOI';
Query OK, 0 rows affected (0.00 sec)
NODE3
MariaDB [testdb]> set session wsrep_OSU_method='RSU';
Query OK, 0 rows affected (0.00 sec)
MariaDB [testdb]> alter table t add column c4 varchar(10);
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
MariaDB [testdb]> set session wsrep_OSU_method='TOI';
Query OK, 0 rows affected (0.00 sec)
</pre>
<br />
And my task is completed. I have successfully changed model of the table.<br />
<br />
<br />
But what can happen if I add new column between existing columns.<br />
Remember, this is not permitted for a row-based replication and can cause replication to brake or something <b>even worse</b>.<br />
<br />
Enable RSU mode on Node1 and add new column c11 after c1 column.<br />
INSERT row simulating active application during schema change.<br />
<br />
<pre class="brush: sql">
NODE1
MariaDB [testdb]> set session wsrep_OSU_method='RSU';
Query OK, 0 rows affected (0.00 sec)
MariaDB [testdb]>
MariaDB [testdb]> alter table t add column c11 varchar(10) after c1;
Query OK, 0 rows affected (0.03 sec)
Records: 0 Duplicates: 0 Warnings: 0
MariaDB [testdb]> set session wsrep_OSU_method='TOI';
Query OK, 0 rows affected (0.00 sec)
MariaDB [testdb]> insert into t(c1,c2,c3) values ('n1-1','n1-1','n1-1');
Query OK, 1 row affected (0.01 sec)
MariaDB [testdb]> select * from t;
+------+------+------+------+------+
| c1 | c11 | c2 | c3 | c4 |
+------+------+------+------+------+
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n2-1 | NULL | n2-1 | n2-1 | NULL |
| n3-1 | NULL | n3-1 | n3-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
+------+------+------+------+------+
5 rows in set (0.00 sec)
</pre>
<br />
INSERT row on other nodes because Galera Cluster allows us write on any node in the cluster configuration.<br />
<br />
<pre class="brush: sql">
NODE2
MariaDB [testdb]> insert into t(c1,c2,c3) values ('n2-1','n2-1','n2-1');
Query OK, 1 row affected (0.01 sec)
MariaDB [testdb]> select * from t;
+------+------+------+------+
| c1 | c2 | c3 | c4 |
+------+------+------+------+
| n1-1 | n1-1 | n1-1 | NULL |
| n1-1 | n1-1 | n1-1 | NULL |
| n2-1 | n2-1 | n2-1 | NULL |
| n3-1 | n3-1 | n3-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 |
| n2-1 | n2-1 | n2-1 | NULL |
+------+------+------+------+
6 rows in set (0.00 sec)
NODE3
MariaDB [testdb]> insert into t(c1,c2,c3) values ('n3-1','n3-1','n3-1');
Query OK, 1 row affected (0.01 sec)
MariaDB [testdb]> select * from t;
+------+------+------+------+
| c1 | c2 | c3 | c4 |
+------+------+------+------+
| n1-1 | n1-1 | n1-1 | NULL |
| n1-1 | n1-1 | n1-1 | NULL |
| n2-1 | n2-1 | n2-1 | NULL |
| n3-1 | n3-1 | n3-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 |
| n2-1 | n2-1 | n2-1 | NULL |
| n3-1 | n3-1 | n3-1 | NULL |
+------+------+------+------+
7 rows in set (0.00 sec)
</pre>
<br />
INSERT commands were successfully executed and everything is OK with my replication.<br />
I don't have any errors in error.log that suggests that I have any problem.<br />
<br />
But check contest of table t on the first node where new column is added.<br />
<br />
<pre class="brush: sql">
NODE1
MariaDB [testdb]> select * from t;
+------+------+------+------+------+
| c1 | c11 | c2 | c3 | c4 |
+------+------+------+------+------+
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n2-1 | NULL | n2-1 | n2-1 | NULL |
| n3-1 | NULL | n3-1 | n3-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n2-1 | n2-1 | n2-1 | NULL | NULL |
| n3-1 | n3-1 | n3-1 | NULL | NULL |
+------+------+------+------+------+
7 rows in set (0.00 sec)
</pre>
<br />
Notice how rows differ between nodes, and we should have exactly the same data on all tree nodes. <br />
<br />
<br />
Let's complete schema changes on other two nodes.<br />
<pre class="brush: sql">
NODE2
MariaDB [testdb]> set session wsrep_OSU_method='RSU';
Query OK, 0 rows affected (0.00 sec)
MariaDB [testdb]> alter table t add column c11 varchar(10) after c1;
Query OK, 0 rows affected (0.03 sec)
Records: 0 Duplicates: 0 Warnings: 0
MariaDB [testdb]> set session wsrep_OSU_method='TOI';
Query OK, 0 rows affected (0.00 sec)
NODE3
MariaDB [testdb]> set session wsrep_OSU_method='RSU';
Query OK, 0 rows affected (0.00 sec)
MariaDB [testdb]> alter table t add column c11 varchar(10) after c1;
Query OK, 0 rows affected (0.34 sec)
Records: 0 Duplicates: 0 Warnings: 0
MariaDB [testdb]> set session wsrep_OSU_method='TOI';
Query OK, 0 rows affected (0.00 sec)
</pre>
<br />
I have successfully added new column, did not brake reapplication and everything seems OK, but my data is not consistent between nodes.<br />
<br />
<pre class="brush: sql">
NODE1
MariaDB [testdb]> select * from t;
+------+------+------+------+------+
| c1 | c11 | c2 | c3 | c4 |
+------+------+------+------+------+
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n2-1 | NULL | n2-1 | n2-1 | NULL |
| n3-1 | NULL | n3-1 | n3-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n2-1 | n2-1 | n2-1 | NULL | NULL |
| n3-1 | n3-1 | n3-1 | NULL | NULL |
+------+------+------+------+------+
7 rows in set (0.00 sec)
NODE2
MariaDB [testdb]> select * from t;
+------+------+------+------+------+
| c1 | c11 | c2 | c3 | c4 |
+------+------+------+------+------+
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n2-1 | NULL | n2-1 | n2-1 | NULL |
| n3-1 | NULL | n3-1 | n3-1 | NULL |
| n1-1 | NULL | NULL | n1-1 | n1-1 |
| n2-1 | NULL | n2-1 | n2-1 | NULL |
| n3-1 | NULL | n3-1 | n3-1 | NULL |
+------+------+------+------+------+
7 rows in set (0.00 sec)
NODE3
MariaDB [testdb]> select * from t;
+------+------+------+------+------+
| c1 | c11 | c2 | c3 | c4 |
+------+------+------+------+------+
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n2-1 | NULL | n2-1 | n2-1 | NULL |
| n3-1 | NULL | n3-1 | n3-1 | NULL |
| n1-1 | NULL | NULL | n1-1 | n1-1 |
| n2-1 | NULL | n2-1 | n2-1 | NULL |
| n3-1 | NULL | n3-1 | n3-1 | NULL |
+------+------+------+------+------+
7 rows in set (0.00 sec)
</pre>
<br />
<br />
<b>Data inconsistency</b> is the worst problem that could happen in synchronous cluster configuration. <br />
It could happen without any notice, but sooner or later it will stop reapplication process and failing node will be excluded from the cluster.<br />
<br />
<br />
<br />
<br />
<br />
<b>REFERENCE</b><br />
<a href="https://dev.mysql.com/doc/refman/5.7/en/replication-features-differing-tables.html">https://dev.mysql.com/doc/refman/5.7/en/replication-features-differing-tables.html</a><br />
<br />
<br />
</span>Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com0tag:blogger.com,1999:blog-2530682427657016426.post-5378508285145483602017-11-13T15:26:00.001+01:002020-11-10T07:48:55.250+01:00HASH GROUP BY not used when using more that 354 aggregate functionsFew days ago we had performance problem with one of our main application views. This was complex view that used a lot of aggregate function. Functions were used to transpose rows into columns.<br />
<br />
When developer added few more aggregate functions for a new columns, query performance changed significantly and we had performance problem.<br />
<br />
After quick analysis I have noticed one change in the execution plan.<br />
<br />
HASH GROUP BY aggregation was replaced with less performant SORT GROUP BY. I have tried to force HASH GROUP BY using hints but nothing helped.<br />
<br />
<span id="fullpost">
<br />
We tried to reproduce problem using dummy tables and then colleague found what was triggering plan change.<br />
<br />
In this example I have query with 354 unique aggregate functions which is using HASH GROUP BY.<br />
<br />
<pre class="brush: sql">
SELECT
*
FROM (SELECT LEVEL ID
FROM DUAL CONNECT BY LEVEL < 1000) VANJSKI,
( SELECT
123 UNUTARNJI_ID,
sum(1) kolona0,
sum(1) kolona1,
sum(2) kolona2,
...
...
...
sum(350) kolona350 ,
sum(351) kolona351 ,
sum(352) kolona352 ,
sum(353) kolona353 ,
sum(354) kolona354
FROM DUAL
GROUP BY 123) UNUTARNJI
WHERE VANJSKI.ID = UNUTARNJI.UNUTARNJI_ID(+);
Plan hash value: 2294628051
---------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 5 (100)| 999 |00:00:00.01 | | | |
|* 1 | HASH JOIN OUTER | | 1 | 1 | 4631 | 5 (20)| 999 |00:00:00.01 | 2293K| 2293K| 1549K (0)|
| 2 | VIEW | | 1 | 1 | 13 | 2 (0)| 999 |00:00:00.01 | | | |
| 3 | CONNECT BY WITHOUT FILTERING| | 1 | | | | 999 |00:00:00.01 | | | |
| 4 | FAST DUAL | | 1 | 1 | | 2 (0)| 1 |00:00:00.01 | | | |
| 5 | VIEW | | 1 | 1 | 4618 | 2 (0)| 1 |00:00:00.01 | | | |
| 6 | HASH GROUP BY | | 1 | 1 | | 2 (0)| 1 |00:00:00.01 | 677K| 677K| 723K (0)|
| 7 | FAST DUAL | | 1 | 1 | | 2 (0)| 1 |00:00:00.01 | | | |
---------------------------------------------------------------------------------------------------------------------------------------
</pre>
<br />
Notice what will happen if I change "sum(1) kolona0" function and add one more unique function.<br />
<br />
<pre class="brush: bash">
SELECT
*
FROM (SELECT LEVEL ID
FROM DUAL CONNECT BY LEVEL < 1000) VANJSKI,
( SELECT
123 UNUTARNJI_ID,
sum(355) kolona0,
sum(1) kolona1,
sum(2) kolona2,
...
...
...
sum(350) kolona350 ,
sum(351) kolona351 ,
sum(352) kolona352 ,
sum(353) kolona353 ,
sum(354) kolona354
FROM DUAL
GROUP BY 123) UNUTARNJI
WHERE VANJSKI.ID = UNUTARNJI.UNUTARNJI_ID(+);
Plan hash value: 2326946862
---------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 5 (100)| 999 |00:00:00.01 | | | |
|* 1 | HASH JOIN OUTER | | 1 | 1 | 4631 | 5 (20)| 999 |00:00:00.01 | 2293K| 2293K| 1645K (0)|
| 2 | VIEW | | 1 | 1 | 13 | 2 (0)| 999 |00:00:00.01 | | | |
| 3 | CONNECT BY WITHOUT FILTERING| | 1 | | | | 999 |00:00:00.01 | | | |
| 4 | FAST DUAL | | 1 | 1 | | 2 (0)| 1 |00:00:00.01 | | | |
| 5 | VIEW | | 1 | 1 | 4618 | 2 (0)| 1 |00:00:00.01 | | | |
| 6 | SORT GROUP BY | | 1 | 1 | | 2 (0)| 1 |00:00:00.01 | 20480 | 20480 |18432 (0)|
| 7 | FAST DUAL | | 1 | 1 | | 2 (0)| 1 |00:00:00.01 | | | |
---------------------------------------------------------------------------------------------------------------------------------------
</pre>
<br />
Query execution plan changed - HASH GROUP BY was replaced with SORT GROUP BY.<br />
<br />
<br />
This was obviously limitation for HASH GROUP BY but I couldn't find more information using Oracle docs or Google so I have asked Oracle support for a confirmation.<br />
<br />
From Oracle support I have received answer that similar case was bug closed as not bug, without workaround. Using default DB_BLOCK_SIZE, the limitation is set to 354 aggregate functions.<br />
<br />
To solve performance problem we have changed view to avoid HASH GROUP BY limitation.<br />
<br />
Testing environment - Oracle Database 12c Enterprise Edition Release 12.1.0.2.0<br />
<br />
<br />
<br />
</span>Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com2tag:blogger.com,1999:blog-2530682427657016426.post-12473100828258283582017-10-21T21:01:00.001+02:002017-10-23T10:40:49.644+02:00Beware of intensive slow query logging when using - log_queries_not_using_indexesMySQL slow query log is great for identifying slow queries that are good candidates for optimisation. Slow query logging is disabled by default, but it is activated by DBA's or developers on most environments.<br />
<br />
You can use slow query log to record all the traffic but be careful with this action as logging all traffic could be very I/O intensive and could have negative impact on general performance. It is recommended to record all traffic only for specific time periods.<br />
<br />
This is why slow query logging is controlled with <i>long_query_time</i> parameter to log only slow queries.<br />
But there is another parameter to think about - <i>log_queries_not_using_indexes</i>.<br />
<span id="fullpost"><br />
By default <i>log_queries_not_using_indexes</i> is disabled. If you have this parameter turned on you will log queries that don’t use an index, or that perform a full index scan where the index doesn't limit the number of rows - <b>regardless of time taken</b>.<br />
<br />
If you have <i>long_query_time</i> configured to reasonable time, and still notice that queries are intensively logged in slow query log file, then you probably have enabled <i>log_queries_not_using_indexes</i>.<br />
<br />
Enabling this parameter you’re practically saying that full scans are "evil" and should be considered for optimisation. But full scan doesn’t always mean that query is slow. In some situations query optimizer chooses full table scan as better option than index or you are probably querying very small table.<br />
<br />
<br />
For instance, on several occasions I've noticed slow query logs flooded with queries like this:<br />
<br />
<pre class="brush: bash">
# Time: 171021 17:51:45
# User@Host: monitor[monitor] @ localhost []
# Thread_id: 1492974 Schema: QC_hit: No
# Query_time: 0.000321 Lock_time: 0.000072 Rows_sent: 0 Rows_examined: 1
# Full_scan: Yes Full_join: No Tmp_table: Yes Tmp_table_on_disk: No
# Filesort: No Filesort_on_disk: No Merge_passes: 0 Priority_queue: No
SET timestamp=1508608305;
SELECT
SCHEMA_NAME
FROM information_schema.schemata
WHERE SCHEMA_NAME NOT IN ('mysql', 'performance_schema', 'information_schema');
+------+-------------+----------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | schemata | ALL | NULL | NULL | NULL | NULL | NULL | Using where |
+------+-------------+----------+------+---------------+------+---------+------+------+-------------+
</pre>
Notice, Query_time: <b>0.000321</b>.<br />
<br />
Should I optimize query that is running 0.000321 secs with adding indexes. Probably not. But anyway, my log is flooded with this or similar queries.<br />
<br />
I don’t see that parameter very useful and I would leave it on default value to avoid possible problems with intensive query logging.
<br />
<br />
<br />
</span>Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com0tag:blogger.com,1999:blog-2530682427657016426.post-66873039545844370642017-10-17T12:46:00.002+02:002020-11-10T07:49:12.475+01:00Enable SSL-encryption for MariaDB Galera ClusterImagine you have MariaDB Galera cluster with nodes running in different data centers. Data centers are not connected via secured VPN tunnel.<br />
As database security is very important you must ensure that traffic between nodes is fully secured.<br />
<br />
Galera Cluster supports encrypted connections between nodes using SSL protocol and in this post I want to show how to encrypt all cluster communication using SSL encryption.<br />
<br />
<br />
<span id="fullpost">
Check current SSL configuration.
<pre class="brush: bash">
MariaDB [(none)]> SHOW VARIABLES LIKE 'have_ssl';
+---------------+----------+
| Variable_name | Value |
+---------------+----------+
| have_ssl | DISABLED | ###==> SSL Disabled
+---------------+----------+
1 row in set (0.01 sec)
MariaDB [(none)]> status
--------------
mysql Ver 15.1 Distrib 10.0.29-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2
Connection id: 56
Current database:
Current user: marko@localhost
SSL: Not in use ###==> SSL is not used
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server: MariaDB
Server version: 10.0.17-MariaDB-1~trusty-wsrep-log mariadb.org binary distribution, wsrep_25.10.r4144
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: utf8
Db characterset: utf8
Client characterset: utf8
Conn. characterset: utf8
UNIX socket: /var/run/mysqld/mysqld.sock
Uptime: 7 days 42 min 29 sec
Threads: 52 Questions: 10 Slow queries: 0 Opens: 0 Flush tables: 1 Open tables: 63 Queries per second avg: 0.000
--------------
</pre>
SSL is currently disabled.<br />
<br />
To fully secure all cluster communication we must SSL-encrypt replication traffic within Galera Cluster, State Snapshot Transfer and traffic between database server and client.<br />
<br />
We will create SSL Certificates and Keys using openssl.<br />
<br />
<pre class="brush: bash">
# Create new folder for certificates
mkdir -p /etc/mysql/ssl
cd /etc/mysql/ssl
# Create CA certificate
# Generate CA key
openssl genrsa 2048 > ca-key.pem
# Using the CA key, generate the CA certificate
openssl req -new -x509 -nodes -days 3600 \
> -key ca-key.pem -out ca-cert.pem
-----
Country Name (2 letter code) [AU]:HR
State or Province Name (full name) [Some-State]:Zagreb
Locality Name (eg, city) []:Zagreb
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Dummycorp
Organizational Unit Name (eg, section) []:IT
Common Name (e.g. server FQDN or YOUR name) []:myu1.localdomain
Email Address []:marko@dummycorp.com
# Create server certificate, remove passphrase, and sign it
# Create the server key
openssl req -newkey rsa:2048 -days 3600 \
> -nodes -keyout server-key.pem -out server-req.pem
-----
Country Name (2 letter code) [AU]:HR
State or Province Name (full name) [Some-State]:Zagreb
Locality Name (eg, city) []:Zagreb
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Dummycorp
Organizational Unit Name (eg, section) []:IT
##==> Use the ".localdomain" only on the first certificate.
Common Name (e.g. server FQDN or YOUR name) []:myu1
Email Address []:marko@dummycorp.com
Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:secretpassword
An optional company name []:
# Process the server RSA key
openssl rsa -in server-key.pem -out server-key.pem
# Sign the server certificate
openssl x509 -req -in server-req.pem -days 3600 \
> -CA ca-cert.pem -CAkey ca-key.pem -set_serial 01 -out server-cert.pem
# Create client certificate, remove passphrase, and sign it
# Create the client key
openssl req -newkey rsa:2048 -days 3600 \
> -nodes -keyout client-key.pem -out client-req.pem
-----
Country Name (2 letter code) [AU]:HR
State or Province Name (full name) [Some-State]:Zagreb
Locality Name (eg, city) []:Zagreb
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Dummycorp
Organizational Unit Name (eg, section) []:IT
Common Name (e.g. server FQDN or YOUR name) []:myu1
Email Address []:marko@dummycorp.com
Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:secretpassword
An optional company name []:
# Process client RSA key
openssl rsa -in client-key.pem -out client-key.pem
# Sign the client certificate
openssl x509 -req -in client-req.pem -days 3600 \
> -CA ca-cert.pem -CAkey ca-key.pem -set_serial 01 -out client-cert.pem
# Verify certificates
openssl verify -CAfile ca-cert.pem server-cert.pem client-cert.pem
server-cert.pem: OK
client-cert.pem: OK
</pre>
<br />
If verification succeeds copy certificates to all nodes in the cluster.<br />
Set mysql as owner of the files. <br />
<br />
<pre class="brush: bash">
# Copy
scp -r /etc/mysql/ssl node1:/etc/mysql
scp -r /etc/mysql/ssl node2:/etc/mysql
scp -r /etc/mysql/ssl node3:/etc/mysql
# Change owner
node1: chown -R mysql:mysql /etc/mysql/ssl
node2: chown -R mysql:mysql /etc/mysql/ssl
node3: chown -R mysql:mysql /etc/mysql/ssl
</pre>
<br />
<br />
<b>Secure database and client connections.</b><br />
<br />
Add following lines in my.cnf configuration file.<br />
<pre class="brush: text">
# MySQL Server
[mysqld]
ssl-ca=/etc/mysql/ssl/ca-cert.pem
ssl-cert=/etc/mysql/ssl/server-cert.pem
ssl-key=/etc/mysql/ssl/server-key.pem
# MySQL Client
[client]
ssl-ca=/etc/mysql/ssl/ca-cert.pem
ssl-cert=/etc/mysql/ssl/client-cert.pem
ssl-key=/etc/mysql/ssl/client-key.pem
</pre>
<br />
<br />
<b>Secure replication traffic.</b><br />
<br />
Define paths to the key, certificate and certificate authority files. Galera Cluster will use this files for encrypting and decrypting replication traffic. <br />
<br />
<pre class="brush: text">
wsrep_provider_options="socket.ssl_key=/etc/mysql/ssl/server-key.pem;socket.ssl_cert=/etc/mysql/ssl/server-cert.pem;socket.ssl_ca=/etc/mysql/ssl/ca-cert.pem"
</pre>
<br />
<br />
<b>Enable SSL for mysqldump and Xtrabackup.</b><br />
<br />
Create user which requires SSL for connection.<br />
<br />
<pre class="brush: text">
MariaDB [(none)]> CREATE USER 'sstssl'@'localhost' IDENTIFIED BY 'sstssl';
Query OK, 0 rows affected (0.03 sec)
MariaDB [(none)]> GRANT PROCESS, RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstssl'@'localhost' REQUIRE ssl;
Query OK, 0 rows affected (0.02 sec)
MariaDB [(none)]> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)
</pre>
<br />
I will use this user for replication.<br />
Change wsrep_sst_auth in my.cnf configuration file.<br />
<br />
<pre class="brush: text">
wsrep_sst_auth="sstssl:sstssl"
</pre>
<br />
<br />
Now we must recreate whole cluster. <br />
If I restart only one node, while others are running, node won't join to existing cluster.<br />
You can notice this errors in mysql error log.<br />
<br />
<pre class="brush: text">
171017 3:20:29 [ERROR] WSREP: handshake with remote endpoint ssl://192.168.56.22:4567 failed: asio.ssl:336031996: 'unknown protocol' ( 336031996: 'error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol')
171017 3:20:29 [ERROR] WSREP: handshake with remote endpoint ssl://192.168.56.23:4567 failed: asio.ssl:336031996: 'unknown protocol' ( 336031996: 'error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol')
</pre>
Shutdown the cluster and bootstrap it.<br />
<br />
<br />
Check.
<pre class="brush: bash">
MariaDB [(none)]> status
--------------
mysql Ver 15.1 Distrib 10.0.29-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2
Connection id: 87
Current database:
Current user: marko@localhost
SSL: Cipher in use is DHE-RSA-AES256-SHA ###==> SSL is used
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server: MariaDB
Server version: 10.0.17-MariaDB-1~trusty-wsrep-log mariadb.org binary distribution, wsrep_25.10.r4144
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: utf8
Db characterset: utf8
Client characterset: utf8
Conn. characterset: utf8
UNIX socket: /var/run/mysqld/mysqld.sock
Uptime: 1 min 4 sec
Threads: 52 Questions: 676 Slow queries: 16 Opens: 167 Flush tables: 1 Open tables: 31 Queries per second avg: 10.562
--------------
MariaDB [(none)]> SHOW VARIABLES LIKE 'have_ssl';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| have_ssl | YES |
+---------------+-------+
1 row in set (0.01 sec)
</pre>
<br />
<br />
<br />
<b>REFERENCES</b>
<br />
<a href="https://dev.mysql.com/doc/refman/5.7/en/creating-ssl-files-using-openssl.html">6.4.3.2 Creating SSL Certificates and Keys Using openssl</a><br />
<a href="https://oracle-base.com/articles/mysql/mysql-configure-ssl-connections">MySQL : Configure SSL Connections</a>
<br />
<br />
</span>Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com3tag:blogger.com,1999:blog-2530682427657016426.post-65974957075506616022017-09-28T12:12:00.000+02:002017-09-28T12:23:03.356+02:00Delete large amounts of data on Galera Cluster using pt-archiverGalera Cluster is excellent virtually synchronous multi-master database cluster. It has many benefits which you can check on <a href="http://galeracluster.com">GaleraCluster</a>.<br />
But beside benefits it has some limitations and one of them is handling large transactions.<br />
<br />
Large replication data sets could degrade performance of whole cluster causing cluster freezing, increased memory consumption, crashing nodes, etc. To avoid this issues it is recommended to split large transactions into smaller chunks.<br />
<br />
In this post I want to show you how to safely delete large amounts of data on Galera Cluster. You can perform this task using several tools or writing custom procedures to split large transaction into chunks. In this example I will use <a href="https://www.percona.com/doc/percona-toolkit/LATEST/pt-archiver.html">pt-archiver</a> tool from Percona.<br />
<br />
<br />
<span id="fullpost">
Imagine you have received task to perform data cleanup in devices table for several schemas.<br />
It looks like very simple task - delete rows from devices table where device_cookie is 0.<br />
<pre class="brush: sql">
delete from devices where device_cookie = 0
</pre>
<br />
But, although statement looks simple it could <b>potentially freeze whole cluster</b> so before executing delete statement count how many rows you need to delete.<br />
<br />
In my case I have to delete few millions of rows which is too much for one transaction so I need to split transaction into smaller chunks.<br />
<br />
<pre class="brush: sql">
mysql> select count(*) from devices;
+----------+
| count(*) |
+----------+
| 2788504 |
+----------+
mysql> select count(*) - (select count(*) from devices where device_cookie = 0)
from devices;
+----------+
| count(*) |
+----------+
| 208 |
+----------+
</pre>
<br />
I have to delete around 2.7 millions of rows.<br />
<br />
This is command I will use:<br />
<pre class="brush: text">
pt-archiver --source h=localhost,u=marko,p="passwd",D=sch_testdb,t=devices \
--purge --where "device_cookie = 0" --sleep-coef 1.0 --txn-size 1000
</pre>
<br />
<b>--purge</b> - <i>delete rows.</i><br />
<b>--where "device_cookie = 0"</b> - <i>filter rows you want to delete.</i><br />
<b>--sleep-coef 1.0</b> - <i>throttle delete process to avoid pause signals from cluster.</i><br />
<b>--txn-size 1000</b> - <i>this is chunk size for every transaction.</i><br />
<br />
<pre class="brush: text">
# time pt-archiver --source h=localhost,u=marko,p="passwd",D=sch_testdb,t=devices \
--purge --where "device_cookie = 0" --sleep-coef 1.0 --txn-size 1000
real 3m32.532s
user 0m17.268s
sys 0m2.460s
</pre>
<br />
Check after delete finished.<br />
<pre class="brush: sql">
mysql> select count(*) from devices;
+----------+
| count(*) |
+----------+
| 208 |
+----------+
1 row in set (0.00 sec)
</pre>
<br />
As I have to perform delete for several schemas, I have created simple shell script which iterates through schema list and executes pt-archiver command.<br />
<br />
<pre class="brush: bash">
# cat delete_rows.sh
#!/bin/bash
LOGFILE=/opt/skripte/schema/table_delete_rows.log
SCHEMA_LIST=/opt/skripte/schema/schema_list.conf
# Get schema list and populate conf file
mysql -B -u marko -ppasswd --disable-column-names --execute "select schema_name from information_schema.schemata where schema_name like 'sch_%' and schema_name <> 'sch_sys'" > $SCHEMA_LIST
while IFS= read -r schema; do
START=`date +%s`
echo "`date`=> Deleting rows from table in schema: $schema"
pt-archiver --source h=localhost,u=marko,p="passwd",D=$schema,t=devices --purge --where "device_cookie = 0" --sleep-coef 1.0 --txn-size 500
SPENT=$(((`date +%s` - $START) / 60))
echo "`date`=> Finished deleting in schema - spent: $SPENT mins"
echo "*************************************************************************"
done <$SCHEMA_LIST >> $LOGFILE
exit 0
</pre>
<br />
<br />
<br />
</span>Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com0tag:blogger.com,1999:blog-2530682427657016426.post-50066701630106155272017-09-16T19:19:00.003+02:002020-11-10T07:49:39.844+01:00Beware of ORA-19721 on 12c using Transportable Tablespace (Oracle changed behavior)Almost every big database has it's hot data which is used often, and cold data which is rarely touched. From version 9i I have used transportable tablespace feature to exclude cold (archive) data from database and keep it on cheap storage or tapes.<br />
<br />
If someone needs to query some of archive tables it was very easy to plug in tablespace for a few days and after archive data is not needed anymore tablespace could be easily dropped. So I was plugging the same tablespaces more than once.<br />
<br />
But when I tried the same process on 12c database I was unpleasantly surprised that Oracle changed behaviour and I could not reattach tablespace. <br />
<br />
Let’s demonstrate this in simple demo case.<br />
<br />
<span id="fullpost">
Create tablespace and set it to be read only.
<pre class="brush: sql">
create tablespace ARCHIVE01 datafile '/oradata1/data/ora12c/archive01.dbf' size 50M;
Tablespace created.
create table archtab tablespace ARCHIVE01 as select * from dba_objects;
Table created.
alter tablespace ARCHIVE01 read only;
Tablespace altered.
create directory export_tts as '/oradata1/export';
Directory created.
</pre>
<br />
Export tablespace metadata.<br />
<pre class="brush: text">
$ expdp '" / as sysdba "' directory=EXPORT_TTS dumpfile=exp_archive01.dmp logfile=exp_archive01.log transport_tablespaces=ARCHIVE01 transport_full_check=Y
Export: Release 12.1.0.2.0 - Production on Sat Sep 16 18:07:27 2017
Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved.
Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
WARNING: Oracle Data Pump operations are not typically needed when connected to the root or seed of a container database.
Starting "SYS"."SYS_EXPORT_TRANSPORTABLE_01": "/******** AS SYSDBA" directory=EXPORT_TTS dumpfile=exp_archive01.dmp logfile=exp_archive01.log transport_tablespaces=ARCHIVE01 transport_full_check=Y
Processing object type TRANSPORTABLE_EXPORT/PLUGTS_BLK
Processing object type TRANSPORTABLE_EXPORT/TABLE
Processing object type TRANSPORTABLE_EXPORT/TABLE_STATISTICS
Processing object type TRANSPORTABLE_EXPORT/STATISTICS/MARKER
Processing object type TRANSPORTABLE_EXPORT/POST_INSTANCE/PLUGTS_BLK
Master table "SYS"."SYS_EXPORT_TRANSPORTABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for SYS.SYS_EXPORT_TRANSPORTABLE_01 is:
/oradata1/export/exp_archive01.dmp
******************************************************************************
Datafiles required for transportable tablespace ARCHIVE01:
/oradata1/data/ora12c/archive01.dbf
Job "SYS"."SYS_EXPORT_TRANSPORTABLE_01" successfully completed at Sat Sep 16 18:08:06 2017 elapsed 0 00:00:3
</pre>
<br />
Drop tablespace but keep datafile.<br />
<pre class="brush: sql">
SQL> drop tablespace ARCHIVE01 including contents keep datafiles;
Tablespace dropped.
</pre>
<br />
Let’s plug in tablespace.<br />
<pre class="brush: sql">
$ impdp '" /as sysdba "' directory=EXPORT_TTS dumpfile=exp_archive01.dmp logfile=imp_archive01.log transport_datafiles='/oradata1/data/ora12c/archive01.dbf'
Import: Release 12.1.0.2.0 - Production on Sat Sep 16 18:11:32 2017
Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved.
Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
WARNING: Oracle Data Pump operations are not typically needed when connected to the root or seed of a container database.
Master table "SYS"."SYS_IMPORT_TRANSPORTABLE_01" successfully loaded/unloaded
Starting "SYS"."SYS_IMPORT_TRANSPORTABLE_01": "/******** AS SYSDBA" directory=EXPORT_TTS dumpfile=exp_archive01.dmp logfile=imp_archive01.log transport_datafiles=/oradata1/data/ora12c/archive01.dbf
Processing object type TRANSPORTABLE_EXPORT/PLUGTS_BLK
Processing object type TRANSPORTABLE_EXPORT/TABLE
Processing object type TRANSPORTABLE_EXPORT/TABLE_STATISTICS
Processing object type TRANSPORTABLE_EXPORT/STATISTICS/MARKER
Processing object type TRANSPORTABLE_EXPORT/POST_INSTANCE/PLUGTS_BLK
Job "SYS"."SYS_IMPORT_TRANSPORTABLE_01" successfully completed at Sat Sep 16 18:11:51 2017 elapsed 0 00:00:18
</pre>
Check alert log.<br />
<pre class="brush: text">
Plug in tablespace ARCHIVE01 with datafile
'/oradata1/data/ora12c/archive01.dbf'
TABLE SYS.WRI$_OPTSTAT_HISTHEAD_HISTORY: ADDED INTERVAL PARTITION SYS_P451 (42993) VALUES LESS THAN (TO_DATE(' 2017-09-17 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
ALTER TABLESPACE "ARCHIVE01" READ WRITE
Completed: ALTER TABLESPACE "ARCHIVE01" READ WRITE
ALTER TABLESPACE "ARCHIVE01" READ ONLY
Sat Sep 16 18:11:51 2017
Converting block 0 to version 10 format
Completed: ALTER TABLESPACE "ARCHIVE01" READ ONLY
</pre>
<br />
Notice that Oracle is altering tablespace (datafile headers) to READ WRITE - <i>Completed: ALTER TABLESPACE "ARCHIVE01" READ WRITE</i>.<br />
<br />
Quote from Oracle Support site:<br />
<blockquote>
Oracle Development declared it as "Expected Behavior"
Starting from 12.1, during the TTS import operation, the tablespaces (datafile headers) are put into read-write mode intermittently in order to fix up TSTZ table columns and clean up unused segments in the datafiles.
This functionality was implemented on many customer's request basis. And, hence, this cannot be reversed. Note that, it intermittently only changes the status to "read-write" and the final status will still be "read-only" only.
</blockquote>
<br />
Now if I drop tablespace and try to reattach it again.<br />
<br />
Create tablespace.
<pre class="brush: sql">
SQL> drop tablespace ARCHIVE01 including contents keep datafiles;
Tablespace dropped.
</pre>
Import tablespace metadata.<br />
<pre class="brush: text">
$ impdp '" /as sysdba "' directory=EXPORT_TTS dumpfile=exp_archive01.dmp logfile=imp_archive01.log transport_datafiles='/oradata1/data/ora12c/archive01.dbf'
Import: Release 12.1.0.2.0 - Production on Sat Sep 16 18:13:51 2017
Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved.
Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
WARNING: Oracle Data Pump operations are not typically needed when connected to the root or seed of a container database.
Master table "SYS"."SYS_IMPORT_TRANSPORTABLE_01" successfully loaded/unloaded
Starting "SYS"."SYS_IMPORT_TRANSPORTABLE_01": "/******** AS SYSDBA" directory=EXPORT_TTS dumpfile=exp_archive01.dmp logfile=imp_archive01.log transport_datafiles=/oradata1/data/ora12c/archive01.dbf
Processing object type TRANSPORTABLE_EXPORT/PLUGTS_BLK
ORA-39123: Data Pump transportable tablespace job aborted
ORA-19721: Cannot find datafile with absolute file number 14 in tablespace ARCHIVE01
Job "SYS"."SYS_IMPORT_TRANSPORTABLE_01" stopped due to fatal error at Sat Sep 16 18:13:55 2017 elapsed 0 00:00:02
</pre>
<br />
I have received error and failed to plug in tablespace.<br />
<br />
<br />
Workaround for this "expected" behaviour is to <b>change datafile permissions in OS level to be read only</b>.<br />
There is also workaround if you are using ASM so check on Oracle supprot site.<br />
<br />
Let’s repeat steps from demo but now using workaround.<br />
<br />
<br />
Create tablespace.<br />
<pre class="brush: sql">
SQL> create tablespace ARCHIVE02 datafile '/oradata1/data/ora12c/archive02.dbf' size 50M;
Tablespace created.
SQL> create table archtab tablespace ARCHIVE02 as select * from dba_objects;
Table created.
SQL> alter tablespace ARCHIVE02 read only;
Tablespace altered.
</pre>
<br />
Export tablespace metadata.<br />
<pre class="brush: text">
$ expdp '" / as sysdba "' directory=EXPORT_TTS dumpfile=exp_archive02.dmp logfile=exp_archive02.log transport_tablespaces=ARCHIVE02 transport_full_check=Y
Export: Release 12.1.0.2.0 - Production on Sat Sep 16 18:18:25 2017
Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved.
Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
WARNING: Oracle Data Pump operations are not typically needed when connected to the root or seed of a container database.
Starting "SYS"."SYS_EXPORT_TRANSPORTABLE_01": "/******** AS SYSDBA" directory=EXPORT_TTS dumpfile=exp_archive02.dmp logfile=exp_archive02.log transport_tablespaces=ARCHIVE02 transport_full_check=Y
Processing object type TRANSPORTABLE_EXPORT/PLUGTS_BLK
Processing object type TRANSPORTABLE_EXPORT/TABLE
Processing object type TRANSPORTABLE_EXPORT/TABLE_STATISTICS
Processing object type TRANSPORTABLE_EXPORT/STATISTICS/MARKER
Processing object type TRANSPORTABLE_EXPORT/POST_INSTANCE/PLUGTS_BLK
Master table "SYS"."SYS_EXPORT_TRANSPORTABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for SYS.SYS_EXPORT_TRANSPORTABLE_01 is:
/oradata1/export/exp_archive02.dmp
******************************************************************************
Datafiles required for transportable tablespace ARCHIVE02:
/oradata1/data/ora12c/archive02.dbf
Job "SYS"."SYS_EXPORT_TRANSPORTABLE_01" successfully completed at Sat Sep 16 18:18:44 2017 elapsed 0 00:00:18
</pre>
<br />
Drop tablespace and keep datafile.<br />
<pre class="brush: sql">
SQL> drop tablespace ARCHIVE02 including contents keep datafiles;
Tablespace dropped.
</pre>
<br />
<br />
Change permissions for datafile to be read only.<br />
<pre class="brush: text">
$ chmod 0440 /oradata1/data/ora12c/archive02.dbf
$ ls -l /oradata1/data/ora12c/archive02.dbf
-r--r-----. 1 oracle oinstall 52436992 Sep 16 18:17 /oradata1/data/ora12c/archive02.dbf
</pre>
<br />
Import tablespace metadata.<br />
<pre class="brush: text">
$ impdp '" /as sysdba "' directory=EXPORT_TTS dumpfile=exp_archive02.dmp logfile=imp_archive02.log transport_datafiles='/oradata1/data/ora12c/archive02.dbf'
Import: Release 12.1.0.2.0 - Production on Sat Sep 16 18:20:23 2017
Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved.
Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
WARNING: Oracle Data Pump operations are not typically needed when connected to the root or seed of a container database.
Master table "SYS"."SYS_IMPORT_TRANSPORTABLE_01" successfully loaded/unloaded
Starting "SYS"."SYS_IMPORT_TRANSPORTABLE_01": "/******** AS SYSDBA" directory=EXPORT_TTS dumpfile=exp_archive02.dmp logfile=imp_archive02.log transport_datafiles=/oradata1/data/ora12c/archive02.dbf
Processing object type TRANSPORTABLE_EXPORT/PLUGTS_BLK
Processing object type TRANSPORTABLE_EXPORT/TABLE
Processing object type TRANSPORTABLE_EXPORT/TABLE_STATISTICS
Processing object type TRANSPORTABLE_EXPORT/STATISTICS/MARKER
Processing object type TRANSPORTABLE_EXPORT/POST_INSTANCE/PLUGTS_BLK
Job "SYS"."SYS_IMPORT_TRANSPORTABLE_01" successfully completed at Sat Sep 16 18:20:28 2017 elapsed 0 00:00:03
</pre>
<br />
In alert log you can notice ORA-1114 IO errors because Oracle cannot modify datafile.<br />
<pre class="brush: text">
Plug in tablespace ARCHIVE02 with datafile
'/oradata1/data/ora12c/archive02.dbf'
ALTER TABLESPACE "ARCHIVE02" READ WRITE
ORA-1114 signalled during: ALTER TABLESPACE "ARCHIVE02" READ WRITE...
</pre>
<br />
Drop tablespace and reattach it again.<br />
<pre class="brush: sql">
SQL> drop tablespace ARCHIVE02 including contents keep datafiles;
Tablespace dropped.
</pre>
<br />
Plug in tablespace. <br />
<pre class="brush: text">
$ impdp '" /as sysdba "' directory=EXPORT_TTS dumpfile=exp_archive02.dmp logfile=imp_archive02.log transport_datafiles='/oradata1/data/ora12c/archive02.dbf'
Import: Release 12.1.0.2.0 - Production on Sat Sep 16 18:22:01 2017
Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved.
Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
WARNING: Oracle Data Pump operations are not typically needed when connected to the root or seed of a container database.
Master table "SYS"."SYS_IMPORT_TRANSPORTABLE_01" successfully loaded/unloaded
Starting "SYS"."SYS_IMPORT_TRANSPORTABLE_01": "/******** AS SYSDBA" directory=EXPORT_TTS dumpfile=exp_archive02.dmp logfile=imp_archive02.log transport_datafiles=/oradata1/data/ora12c/archive02.dbf
Processing object type TRANSPORTABLE_EXPORT/PLUGTS_BLK
Processing object type TRANSPORTABLE_EXPORT/TABLE
Processing object type TRANSPORTABLE_EXPORT/TABLE_STATISTICS
Processing object type TRANSPORTABLE_EXPORT/STATISTICS/MARKER
Processing object type TRANSPORTABLE_EXPORT/POST_INSTANCE/PLUGTS_BLK
Job "SYS"."SYS_IMPORT_TRANSPORTABLE_01" successfully completed at Sat Sep 16 18:22:05 2017 elapsed 0 00:00:03
</pre>
<br />
Now I didn’t received error and I was able to plug in tablespace.<br />
I have to remind myself to change datafile permissions before plugging tablespaces from 12c version.<br />
<br />
<br />
<br />
<b>REFERENCES</b><br />
Doc ID 2094476.1<br />
<br />
<br />
</span>Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com0tag:blogger.com,1999:blog-2530682427657016426.post-82334658296520869012017-03-06T09:16:00.002+01:002020-11-10T07:49:52.710+01:00Using In-Memory Option with SQL Plan Baselines, SQL Profiles and SQL HintsOracle database In-Memory option was introduced in 12.1.0.2 patchset. It is great feature to improve performance of analytic queries. For mixed workload OLTP environments In-Memory option could improve performance of analytic queries without significant negative affect on quick OLTP queries or DML operations.<br />
<br />
So you have decided that In-Memory option could be great for you and now you want to implement this option for your critical production database.<br />
<br />
But in your code you have many SQL hints hard-coded, SQL Profiles implemented or SQL Plan baselines created to solve problems with unstable query performance. What will happen with execution plans if you populate In-Memory column store with critical tables in the database.<br />
<br />
<span id="fullpost">
Example:<br />
Version : Oracle 12.1.0.2 <br />
<br />
For test I will use query with fixed plan using both SQL profile and SQL plan baseline. <br />
<br />
<pre class="brush: sql">
select object_type, count(*)
from admin.big_table
group by object_type;
OBJECT_TYPE COUNT(*)
----------------------- ----------
PACKAGE 14858
PACKAGE BODY 13724
PROCEDURE 2254
PROGRAM 110
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'TYPICAL'));
SQL_ID 8g28yt7c1nacr, child number 0
-------------------------------------
select object_type, count(*) from admin.big_table group by object_type
Plan hash value: 1753714399
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 4819 (100)| |
| 1 | HASH GROUP BY | | 39 | 351 | 4819 (1)| 00:00:01 |
| 2 | TABLE ACCESS FULL| BIG_TABLE | 1000K| 8789K| 4795 (1)| 00:00:01 |
--------------------------------------------------------------------------------
DECLARE
my_plans pls_integer;
BEGIN
my_plans := DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE(
sql_id => '8g28yt7c1nacr');
END;
/
@coe_xfr_sql_profile 8g28yt7c1nacr 1753714399
@coe_xfr_sql_profile_8g28yt7c1nacr_1753714399.sql
select object_type, count(*)
from admin.big_table
group by object_type;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'TYPICAL'));
SQL_ID 8g28yt7c1nacr, child number 0
-------------------------------------
select object_type, count(*) from admin.big_table group by object_type
Plan hash value: 1753714399
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 4819 (100)| |
| 1 | HASH GROUP BY | | 39 | 351 | 4819 (1)| 00:00:01 |
| 2 | TABLE ACCESS FULL| BIG_TABLE | 1000K| 8789K| 4795 (1)| 00:00:01 |
--------------------------------------------------------------------------------
Note
-----
- SQL profile coe_8g28yt7c1nacr_1753714399 used for this statement
- SQL plan baseline SQL_PLAN_1wn92bz7gqvxx73be0962 used for this statement
</pre>
<br />
Note section in execution plan output says that I’m using both SQL profile and SQL plan baseline for this query.<br />
<br />
I have previously enabled In-Memory Column Store and now I will populate table data into the in-memory column store.<br />
<br />
<pre class="brush: sql">
alter table admin.big_table inmemory priority critical;
col segment_name for a15
select segment_name,
inmemory_size/1024/1024 im_size_mb,
bytes/1024/1024 size_mb,
bytes_not_populated,
inmemory_compression
from v$im_segments;
SEGMENT_NAME IM_SIZE_MB SIZE_MB BYTES_NOT_POPULATED INMEMORY_COMPRESS
--------------- ---------- ---------- ------------------- -----------------
BIG_TABLE 27.1875 144 0 FOR QUERY LOW
1 row selected.
</pre>
<br />
Run query again.<br />
<br />
<pre class="brush: sql">
select object_type, count(*)
from admin.big_table
group by object_type;
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'TYPICAL'));
SQL_ID 8g28yt7c1nacr, child number 0
-------------------------------------
select object_type, count(*) from admin.big_table group by object_type
Plan hash value: 1753714399
--------------------------------------------------------------------------------------
|Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 257 (100)| |
| 1 | HASH GROUP BY | | 39 | 351 | 257 (13)| 00:00:01|
| 2 | TABLE ACCESS INMEMORY FULL| BIG_TABLE | 1000K| 8789K| 233 (4)| 00:00:01|
--------------------------------------------------------------------------------------
Note
-----
- SQL profile coe_8g28yt7c1nacr_1753714399 used for this statement
- SQL plan baseline SQL_PLAN_1wn92bz7gqvxx73be0962 used for this statement
</pre>
<br />
Notice "TABLE ACCESS INMEMORY FULL" operation is used instead of "TABLE ACCESS FULL" and both SQL profile and SQL plan baselines are used for this query.<br />
<br />
In this case Oracle used in-memory column store to read data without any intervention on SQL profile or SQL plan baseline. Plan hash value remained the same in both cases.<br />
<br />
<br />
But what if we have index operations involved in execution plan.<br />
<br />
<pre class="brush: sql">
-- Temporary disable IM column store to optimise queries
SQL> alter system set inmemory_query=DISABLE;
-- Force Oracle to use index
SQL> alter session set optimizer_index_caching=100;
SQL> alter session set optimizer_index_cost_adj=1;
select object_type, count(*)
from admin.big_table
where object_type > 'C'
group by object_type;
SQL_ID 8xvfvz3axf5ct, child number 0
-------------------------------------
select object_type, count(*) from admin.big_table where object_type >
'C' group by object_type
Plan hash value: 3149057435
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 28 (100)| |
| 1 | SORT GROUP BY NOSORT| | 39 | 351 | 28 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_OBJ_TYPE | 1000K| 8789K| 28 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_TYPE">'C')
-- Create SQL plan baseline
DECLARE
my_plans pls_integer;
BEGIN
my_plans := DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE(
sql_id => '8xvfvz3axf5ct');
END;
/
-- Create SQL profile
SQL>@coe_xfr_sql_profile 8xvfvz3axf5ct 3149057435
SQL>@coe_xfr_sql_profile_8xvfvz3axf5ct_3149057435.sql
</pre>
<br />
<br />
I have slightly different query with "INDEX RANGE SCAN" operation in execution plan. SQL plan baseline and SQL profile are both created for this query.<br />
<br />
<br />
In Note section you can see that SQL profile and SQL plan baseline are both used.<br />
<br />
<pre class="brush: sql">
select object_type, count(*)
from admin.big_table
where object_type > 'C'
group by object_type;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'TYPICAL'));
SQL_ID 8xvfvz3axf5ct, child number 0
-------------------------------------
select object_type, count(*) from admin.big_table where object_type >
'C' group by object_type
Plan hash value: 3149057435
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 28 (100)| |
| 1 | SORT GROUP BY NOSORT| | 39 | 351 | 28 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_OBJ_TYPE | 1000K| 8789K| 28 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_TYPE">'C')
Note
-----
- SQL profile coe_8xvfvz3axf5ct_3149057435 used for this statement
- SQL plan baseline SQL_PLAN_76jwvc1sug4k44391ca35 used for this statement
</pre>
<br />
<br />
Enable IM column store to optimise queries.<br />
<br />
<pre class="brush: sql">
SQL> alter system set inmemory_query=ENABLE;
System altered.
</pre>
<br />
<pre class="brush: sql">
select object_type, count(*)
from admin.big_table
where object_type > 'C'
group by object_type;
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'TYPICAL'));
SQL_ID 8xvfvz3axf5ct, child number 1
-------------------------------------
select object_type, count(*) from admin.big_table where object_type >
'C' group by object_type
Plan hash value: 3149057435
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 28 (100)| |
| 1 | SORT GROUP BY NOSORT| | 39 | 351 | 28 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_OBJ_TYPE | 1000K| 8789K| 28 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_TYPE">'C')
Note
-----
- SQL profile coe_8xvfvz3axf5ct_3149057435 used for this statement
- SQL plan baseline SQL_PLAN_76jwvc1sug4k44391ca35 used for this statement
</pre>
<br />
This time in-memory option is not used to improve performance of the query.<br />
<br />
Let’s drop SQL profile and leave SQL plan baseline enabled.<br />
<br />
<pre class="brush: sql">
exec dbms_sqltune.drop_sql_profile('coe_8xvfvz3axf5ct_3149057435');
elect object_type, count(*)
from admin.big_table
where object_type > 'C'
group by object_type;
Plan hash value: 1753714399
--------------------------------------------------------------------------------------
| Id| Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 39 | 351 | 255 (12)| 00:00:01|
| 1 | HASH GROUP BY | | 39 | 351 | 255 (12)| 00:00:01|
|*2 | TABLE ACCESS INMEMORY FULL| BIG_TABLE | 1000K| 8789K| 231 (3)| 00:00:01|
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - inmemory("OBJECT_TYPE">'C')
filter("OBJECT_TYPE">'C')
Note
-----
- SQL plan baseline "SQL_PLAN_76jwvc1sug4k473be0962" used for this statement
</pre>
<br />
Note section says that SQL plan baseline is used for this statement, but different than before.<br />
I have "TABLE ACCESS INMEMORY FULL" operation and plan has changed automatically.<br />
<br />
In Oracle 12cR1 Adaptive SQL Plan Management is enabled by default. Oracle calculated more efficient plan using in-memory column store and automatically accepted new SQL execution plan for this query. As new SQL plan is added and accepted Oracle was able to change execution plan.<br />
<br />
<pre class="brush: sql">
set lines 200
set pages 999
col plan_name for a30
col sql_text for a50 wrap
select plan_name, sql_text, enabled, accepted
from dba_sql_plan_baselines
where sql_text like '%object_type > %';
PLAN_NAME SQL_TEXT ENA ACC
------------------------------ --------------------------------------- --- ---
SQL_PLAN_76jwvc1sug4k4ebe5b30f select object_type, count(*) YES NO
from admin.big_table
where object_type > 'C'
group by object_type
SQL_PLAN_76jwvc1sug4k473be0962 select object_type, count(*) YES YES
from admin.big_table
where object_type > 'C'
group by object_type
SQL_PLAN_76jwvc1sug4k44391ca35 select object_type, count(*) YES YES
from admin.big_table
where object_type > 'C'
group by object_type
</pre>
<br />
What if I disable adaptive sql plan management to forbid automatically evolving existing baselines.<br />
<br />
<pre class="brush: sql">
-- Disable automatic evolving
BEGIN
DBMS_SPM.set_evolve_task_parameter(
task_name => 'SYS_AUTO_SPM_EVOLVE_TASK',
parameter => 'ACCEPT_PLANS',
value => 'FALSE');
END;
/
-- Drop SQL plan baseline used for in-memory full scan
DECLARE
l_plans_dropped PLS_INTEGER;
BEGIN
l_plans_dropped := DBMS_SPM.drop_sql_plan_baseline (
sql_handle => NULL,
plan_name => 'SQL_PLAN_76jwvc1sug4k473be0962');
END;
/
</pre>
<br />
In-memory full scan is not used as index range scan operation was specified in existing baseline which is used for query.<br />
<br />
<pre class="brush: sql">
select object_type, count(*)
from admin.big_table
where object_type > 'C'
group by object_type;
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'TYPICAL'));
SQL_ID 8xvfvz3axf5ct, child number 1
-------------------------------------
select object_type, count(*) from admin.big_table where object_type >
'C' group by object_type
Plan hash value: 3149057435
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 28 (100)| |
| 1 | SORT GROUP BY NOSORT| | 39 | 351 | 28 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_OBJ_TYPE | 1000K| 8789K| 28 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_TYPE">'C')
Note
-----
- SQL plan baseline SQL_PLAN_76jwvc1sug4k44391ca35 used for this statement
</pre>
<br />
New plan was added but this time it is not accepted automatically and taken in consideration by the optimizer. We have to manually validate and accept new plan to use it for query executions.<br />
<br />
<pre class="brush: sql">
set lines 200
set pages 999
col plan_name for a30
col sql_text for a50 wrap
select plan_name, sql_text, enabled, accepted
from dba_sql_plan_baselines
where sql_text like '%object_type > %';
PLAN_NAME SQL_TEXT ENA ACC
------------------------------ ---------------------------------------- --- ---
SQL_PLAN_76jwvc1sug4k4ebe5b30f select object_type, count(*) YES NO
from admin.big_table
where object_type > 'C'
group by object_type
SQL_PLAN_76jwvc1sug4k473be0962 select object_type, count(*) YES NO
from admin.big_table
where object_type > 'C'
group by object_type
SQL_PLAN_76jwvc1sug4k44391ca35 select object_type, count(*) YES YES
from admin.big_table
where object_type > 'C'
group by object_type
</pre>
<br />
<br />
What will happen if I have query with hint.<br />
<br />
<pre class="brush: sql">
select /*+index(t IDX_OBJ_TYPE)*/
object_type, count(*)
from admin.big_table t
where object_type > 'C'
group by object_type;
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'TYPICAL'));
SQL_ID 8k7fykgphx8ra, child number 0
-------------------------------------
select /*+index(t IDX_OBJ_TYPE)*/ object_type, count(*) from
admin.big_table t where object_type > 'C' group by object_type
Plan hash value: 3149057435
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2770 (100)| |
| 1 | SORT GROUP BY NOSORT| | 39 | 351 | 2770 (1)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_OBJ_TYPE | 1000K| 8789K| 2770 (1)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_TYPE">'C')
</pre>
<br />
In-memory data access is ignored as we have hint forcing usage of the index.<br />
<br />
<pre class="brush: sql">
select /*+full(t)*/
object_type, count(*)
from admin.big_table t
where object_type > 'C'
group by object_type;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'TYPICAL'));
Plan hash value: 1753714399
--------------------------------------------------------------------------------------
|Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 39 | 351 | 255 (12)| 00:00:01|
| 1 | HASH GROUP BY | | 39 | 351 | 255 (12)| 00:00:01|
|*2 | TABLE ACCESS INMEMORY FULL| BIG_TABLE | 1000K| 8789K| 231 (3)| 00:00:01|
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - inmemory("OBJECT_TYPE">'C')
filter("OBJECT_TYPE">'C')
</pre>
<br />
In case we have hint forcing full scan, query will read data from in-memory column store as "TABLE ACCESS INMEMORY FULL" and "TABLE ACCESS FULL" are the same full table scan operations for the optimizer.
<br />
<br />
<br />
<b>Conclusion</b> <br />
If your production application is heavily dependent on SQL profiles and SQL hints it would be hard to use full potential of in-memory column store option in a short time.<br />
With SQL plan baselines it is slightly easier because you could use Adaptive SQL Plan Management to alter plans.<br />
<br />
But you must dedicate some time for proper testing, because changing plans and dropping indexes blindly could cause many performance problems.
<br />
<br />
<br />
<br />
</span>Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com0tag:blogger.com,1999:blog-2530682427657016426.post-65147285371731895372016-11-03T22:19:00.001+01:002020-11-10T07:50:04.500+01:00Reduce Hard Parse time using SQL ProfileFew days ago we had concurrency problem with "<b>cursor: pin S wait on X</b>" wait event. This wait event is mostly associated with parsing in some form.<br />
<br />
After quick diagnosis I’ve found problematic query. It was fairly complex query which was executed very often with average 0.20 seconds of execution time. As this query was using bind variables, Oracle reused existing plan and problems with "cursor: pin S wait on X" wait events weren’t appearing.<br />
<br />
But when hard parse occurred we experienced problems with specified mutex waits. Query execution with hard parsing jumped from 0.20 seconds to over 2,1 seconds.<br />
<br />
One session would hold mutex pin in exclusive mode while other sessions were waiting to get a mutex pin in share mode - waiting with "Cursor: pin S wait on X" wait event.<br />
<span id="fullpost">
<br />
Rewriting query would solve this issue but we needed some solution quickly.<br />
<br />
<br />
I have decided to perform few tests using SQL plan baselines and SQL profiles and measure effect on hard parse. Tested query is intentionally excluded from the post.<br />
<br />
Version : Oracle 12.1.0.2 <br />
<br />
Query execution statistics:<br />
<pre class="brush: text">
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 1.15 2.09 0 10 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.01 0 177 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 1.16 2.11 0 187 0 1
Statistics
----------------------------------------------------------
1691 recursive calls
0 db block gets
1594 consistent gets
0 physical reads
0 redo size
7266 bytes sent via SQL*Net to client
8393 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
60 sorts (memory)
0 sorts (disk)
1 rows processed
</pre>
<br />
Total query execution is 2.11 seconds where parsing took 2.09 seconds which is practically whole query execution time.<br />
<br />
<br />
What will happen if we create fixed baseline for the query:<br />
<br />
<pre class="brush: text">
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 1.15 2.09 0 7 0 0
Execute 1 0.00 0.00 0 0 1 0
Fetch 2 0.00 0.01 0 177 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 1.16 2.11 0 184 1 1
Note
-----
- SQL plan baseline "SQL_PLAN_6q3anxq5dfsj4e57c1833" used for this statement
Statistics
----------------------------------------------------------
1691 recursive calls
0 db block gets
1594 consistent gets
0 physical reads
0 redo size
7287 bytes sent via SQL*Net to client
8393 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
60 sorts (memory)
0 sorts (disk)
1 rows processed
</pre>
<br />
I have practically the same results which means that SQL plan baseline had no effect on parse time.<br />
<br />
<br />
But, what will happen if I create SQL profile instead of baseline:<br />
<br />
<pre class="brush: text">
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.65 1.21 6 21 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.01 0.01 0 177 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.66 1.23 6 198 0 1
Note
-----
- SQL profile "PROFILE_09vf7nstqk7n2" used for this statement
Statistics
----------------------------------------------------------
654 recursive calls
0 db block gets
1300 consistent gets
6 physical reads
0 redo size
7284 bytes sent via SQL*Net to client
8393 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
60 sorts (memory)
0 sorts (disk)
1 rows processed
</pre>
<br />
This is big improvement.<br />
Notice elapsed time for parse - <b>from 2.09 secs to 1.21 secs</b>.<br />
Check query statistics - almost <b>three times less recursive calls</b>.<br />
<br />
<br />
But why?<br />
This is my explanation and I might be wrong so please leave comment below if this is the case.<br />
<br />
When we’re using SQL baselines for plan management first step is always generating execution plans from optimizer. Cost based optimizer produces several plans and then compares plans with plans in the SQL plan baseline. Many different plans will be probed as a part of optimizer calculations. SQL plan baseline has no effect on number of calculations. <br />
<br />
With SQL profiles we will feed optimizer with estimations and hints before calculation starts. Future plan will be influenced by the SQL profile. Basically we will point optimizer "in the right direction" and optimizer will not perform the same amount of calculations like before. As a result we have <b>less recursive calls and less time spent on hard parsing</b>.
<br />
<br />
<br />
After "fixing" plan with SQL profile, I’ve tried to reproduce mutex concurrency intentionally forcing hard parse but now Oracle managed to perform hard parse without affecting many sessions. Maybe I’ve solved problem temporarily and bought some time for developers to rewrite problematic query.
<br />
<br />
<br />
<br />
</span>Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com2tag:blogger.com,1999:blog-2530682427657016426.post-35050428669872777182016-06-28T10:27:00.001+02:002020-11-10T07:50:14.076+01:00Using Adaptive Cursors Sharing with SQL Plan BaselinesWe have several databases where automatic capturing of sql plan baselines is enabled for a few schemas.<br />
<br />
Execution of some queries deeply depend on variables where is not always the best to reuse same execution plan for all executions. For those queries I want to avoid using literals and inefficient execution plans. Also, I want to use SQL plan baselines as I have automatic capturing enabled.<br />
<br />
Question is, can I make Adaptive Cursor Sharing to work with SQL Plan Baselines without changing query?<br />
Activate bind awareness for every execution to avoid inefficient execution plans?<br />
<br />
I want to avoid even one inefficient execution or wait for ACS kick in automatically, because this one lousy execution could be potentially big problem.<br />
<br />
<br />
For demo case I’m using 1000000 rows table with skewed data:<br />
<br />
<span id="fullpost">
<pre class="brush: sql">
SQL> select * from v$version;
BANNER CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production 0
PL/SQL Release 12.1.0.2.0 - Production 0
CORE 12.1.0.2.0 Production 0
TNS for IBM/AIX RISC System/6000: Version 12.1.0.2.0 - Production 0
NLSRTL Version 12.1.0.2.0 - Production 0
select owner, count(*)
from big_table
group by owner;
OWNER COUNT(*)
---------- ----------
MDSYS 1
PUBLIC 499999
SYS 499999
ORDSYS 1
create index IDX_OWNER on BIG_TABLE(owner);
begin
dbms_stats.gather_table_stats(ownname=>'MSUTIC',tabname=>'BIG_TABLE',cascade=>TRUE, estimate_percent=>100, method_opt=>'for columns size 4 owner');
end;
/
</pre>
<br />
<br />
This is my test query.<br />
<br />
<pre class="brush: sql">
SQL> var own varchar2(10);
SQL> exec :own := 'SYS';
select owner, sum(object_id)
from big_table
where owner = :own
group by owner;
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'TYPICAL'));
SQL_ID 5cdba9s9mkag7, child number 0
-------------------------------------
select owner, sum(object_id) from big_table where owner = :own group by
owner
Plan hash value: 2943376087
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3552 (100)| |
| 1 | SORT GROUP BY NOSORT| | 499K| 9277K| 3552 (1)| 00:00:01 |
|* 2 | TABLE ACCESS FULL | BIG_TABLE | 499K| 9277K| 3552 (1)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("OWNER"=:OWN)
</pre>
<br />
For a first execution bind sensitivity is enabled because I have gathered statistics with histogram.<br />
<br />
<pre class="brush: sql">
select sql_id
, is_bind_aware
, is_bind_sensitive
, is_shareable
, plan_hash_value
from v$sql
where sql_id = '5cdba9s9mkag7';
SQL_ID I I I PLAN_HASH_VALUE
------------- - - - ---------------
5cdba9s9mkag7 N Y Y 2943376087
</pre>
<br />
<br />
To enable bind awareness I want to insert BIND_AWARE hint without changing query. <br />
<br />
I will use SQL Patch for this:<br />
<br />
<pre class="brush: sql">
SQL> begin
sys.dbms_sqldiag_internal.i_create_patch(
sql_text => 'select owner, sum(object_id)
from big_table
where owner = :own
group by owner',
hint_text => 'BIND_AWARE',
name => 'bind_aware_patch');
end;
/ 2 3 4 5 6 7 8 9 10
PL/SQL procedure successfully completed.
</pre>
<br />
Now let’s check execution and bind awareness for the query.<br />
<br />
<pre class="brush: sql">
SQL> var own varchar2(10);
SQL> exec :own := 'SYS';
select owner, sum(object_id)
from big_table
where owner = :own
group by owner;
SQL_ID 5cdba9s9mkag7, child number 0
-------------------------------------
select owner, sum(object_id) from big_table where owner = :own group by
owner
Plan hash value: 2943376087
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3552 (100)| |
| 1 | SORT GROUP BY NOSORT| | 499K| 9277K| 3552 (1)| 00:00:01 |
|* 2 | TABLE ACCESS FULL | BIG_TABLE | 499K| 9277K| 3552 (1)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("OWNER"=:OWN)
Note
-----
- SQL patch "bind_aware_patch" used for this statement
select sql_id
, is_bind_aware
, is_bind_sensitive
, is_shareable
, plan_hash_value
from v$sql
where sql_id = '5cdba9s9mkag7';
SQL_ID I I I PLAN_HASH_VALUE
------------- - - - ---------------
5cdba9s9mkag7 Y Y Y 2943376087
</pre>
<br />
<br />
We have note that SQL patch is used and we have bind awareness enabled. For every query execution, during hard parse, Oracle will peak variable and calculate efficient execution plan accordingly. At least, I would expect this.<br />
<br />
<br />
Let’s try with another variable - will Oracle alter execution plan.<br />
<pre class="brush: sql">
SQL> var own varchar2(10);
SQL> exec :own := 'MDSYS';
select owner, sum(object_id)
from big_table
where owner = :own
group by owner;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'TYPICAL'));
SQL_ID 5cdba9s9mkag7, child number 1
-------------------------------------
select owner, sum(object_id) from big_table where owner = :own group by
owner
Plan hash value: 1772680857
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 4 (100)| |
| 1 | SORT GROUP BY NOSORT | | 1 | 19 | 4 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| BIG_TABLE | 1 | 19 | 4 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | IDX_OWNER | 1 | | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("OWNER"=:OWN)
Note
-----
- SQL patch "bind_aware_patch" used for this statement
select sql_id
, is_bind_aware
, is_bind_sensitive
, is_shareable
, plan_hash_value
from v$sql
where sql_id = '5cdba9s9mkag7';
SQL_ID I I I PLAN_HASH_VALUE
------------- - - - ---------------
5cdba9s9mkag7 Y Y Y 2943376087
5cdba9s9mkag7 Y Y Y 1772680857
</pre>
<br />
Notice how Oracle changed execution plan and now we have two plans for specified sql text.<br />
<br />
<br />
Capture SQL plans from cursor cache to create baseline.<br />
<br />
<pre class="brush: sql">
DECLARE
my_plans pls_integer;
BEGIN
my_plans := DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE(
sql_id => '5cdba9s9mkag7');
END;
/
</pre>
<br />
We have two ACCEPTED plans saved for this query which Oracle will consider during execution, and SQL patch forcing bind awareness.<br />
<br />
<pre class="brush: sql">
set lines 200
col sql_handle for a25
col plan_name for a35
select sql_handle, plan_name, enabled, accepted, fixed
from dba_sql_plan_baselines
where sql_handle='SQL_f02626d2f3cad6cc';
SQL_HANDLE PLAN_NAME ENA ACC FIX
------------------------- ----------------------------------- --- --- ---
SQL_f02626d2f3cad6cc SQL_PLAN_g09j6ubtwppqc69a8f699 YES YES NO
SQL_f02626d2f3cad6cc SQL_PLAN_g09j6ubtwppqcaf705ad7 YES YES NO
</pre>
<br />
<br />
Now we will perform test to check will Oracle alter execution plan on variable value.<br />
<br />
<pre class="brush: sql">
SQL> var own varchar2(10);
SQL> exec :own := 'SYS';
select owner, sum(object_id)
from big_table
where owner = :own
group by owner;
OWNER SUM(OBJECT_ID)
-------------------------------- --------------
SYS 7.5387E+10
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'TYPICAL'));
SQL_ID 5cdba9s9mkag7, child number 0
-------------------------------------
select owner, sum(object_id) from big_table where owner = :own group by
owner
Plan hash value: 2943376087
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3552 (100)| |
| 1 | SORT GROUP BY NOSORT| | 499K| 9277K| 3552 (1)| 00:00:01 |
|* 2 | TABLE ACCESS FULL | BIG_TABLE | 499K| 9277K| 3552 (1)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("OWNER"=:OWN)
Note
-----
- SQL patch "bind_aware_patch" used for this statement
- SQL plan baseline SQL_PLAN_g09j6ubtwppqcaf705ad7 used for this statement
</pre>
<br />
Oracle used SQL patch and SQL plan baseline.<br />
<br />
What if I change variable value.<br />
<br />
<pre class="brush: sql">
SQL> var own varchar2(10);
SQL> exec :own := 'MDSYS';
select owner, sum(object_id)
from big_table
where owner = :own
group by owner;
OWNER SUM(OBJECT_ID)
-------------------------------- --------------
MDSYS 182924
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'TYPICAL'));
SQL_ID 5cdba9s9mkag7, child number 1
-------------------------------------
select owner, sum(object_id) from big_table where owner = :own group by
owner
Plan hash value: 1772680857
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 4 (100)| |
| 1 | SORT GROUP BY NOSORT | | 1 | 19 | 4 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| BIG_TABLE | 1 | 19 | 4 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | IDX_OWNER | 1 | | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("OWNER"=:OWN)
Note
-----
- SQL patch "bind_aware_patch" used for this statement
- SQL plan baseline SQL_PLAN_g09j6ubtwppqc69a8f699 used for this statement
</pre>
<br />
Oracle immediately changed execution plan and used different SQL plan baseline.<br />
<br />
<br />
At the end I have original query with bind variables, I have SQL plan baselines captured, and I’m using powerful ACS feature to have efficient plans for different variables.<br />
<br />
</span>Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com1tag:blogger.com,1999:blog-2530682427657016426.post-64130321218410097292016-02-24T14:37:00.002+01:002020-11-10T07:50:29.784+01:00Slow full table scan due to row chainingFew days ago I’ve received complaint that simple count on 2 million rows table is running forever.<br />
<br />
This was the statement:<br />
<pre class="brush: sql">
select count(1)
from CLIENT k
where k.expires is null;
</pre>
<br />
I've used fake names for table name and columns.
<br />
Database version: 11.2.0.4.0<br />
<br />
Indeed, query was running longer than I would expect. Oracle was using FULL SCAN of the table with "db file sequential read" wait events. This was little odd to me as I would expect "direct path reads" or "db file scattered reads".<br />
<br />
<br />
<span id="fullpost">
It was partitioned table with 4 partitions and 294 columns.<br />
<br />
<pre class="brush: sql">
select count(*) from dba_tab_columns where table_name = 'CLIENT';
COUNT(*)
----------
294
select owner, segment_name, partition_name, bytes, blocks
from dba_segments
where segment_name in ('CLIENT');
OWNER SEGMENT_NAME PARTITION_NAME BYTES BLOCKS
---------- --------------- -------------------- ---------- ----------
SCOTT CLIENT CLIENT_OTHER 8388608 1024
SCOTT CLIENT CLIENT_CITY 1643118592 200576
SCOTT CLIENT CLIENT_CNTR 591396864 72192
SCOTT CLIENT CLIENT_STRNG 52428800 6400
select table_name, partition_name, NUM_ROWS, AVG_ROW_LEN
from dba_tab_partitions
where table_name='CLIENT';
TABLE_NAME PARTITION_NAME NUM_ROWS AVG_ROW_LEN
------------------------------ ----------------------- ----------- ---------------
CLIENT CLIENT_OTHER 0 0
CLIENT CLIENT_CITY 1469420 572
CLIENT CLIENT_CNTR 592056 495
CLIENT CLIENT_STRNG 48977 565
select table_name, data_type, count(*)
from dba_tab_cols
where table_name='CLIENT'
group by table_name, data_type
order by 3 desc;
TABLE_NAME DATA_TYPE COUNT(*)
---------- ---------------------------------------- ----------
CLIENT NUMBER 191
CLIENT VARCHAR2 70
CLIENT DATE 32
CLIENT TIMESTAMP(6) 3
CLIENT RAW 2
CLIENT CL_UTR 1
CLIENT O_TIP_KAR 1
CLIENT O_ZA_NA 1
CLIENT O_PO_OSO 1
</pre>
<br />
Some of the columns were collections.<br />
<br />
<pre class="brush: sql">
select type_name, typecode
from dba_types
where type_name in (select data_type
from dba_tab_cols
where table_name='CLIENT'
and data_type not in ('NUMBER','VARCHAR2',
'DATE','TIMESTAMP(6)','RAW'));
TYPE_NAME TYPECODE
------------------------------ ------------------------------
CL_UTR COLLECTION
O_TIP_KAR COLLECTION
O_ZA_NA COLLECTION
O_PO_OSO COLLECTION
</pre>
<br />
These were varrays used to store multivalued attributes.<br />
<br />
<br />
In trace I've seen lots disk reads and elapsed time over 2400 seconds.<br />
<br />
<pre class="brush: text">
select count(1)
from CLIENT k
where k.expires is null
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 203.96 2450.19 5455717 8240323 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 203.97 2450.20 5455717 8240323 0 1
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 369 (MSUTIC)
Number of plan statistics captured: 1
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
1 1 1 SORT AGGREGATE (cr=8240323 pr=5455717 pw=0 time=1349733885 us)
1905617 1905617 1905617 PARTITION LIST ALL PARTITION: 1 4 (cr=8240323 pr=5455717 pw=0 time=2449532855 us cost=164110 size=3801914 card=1900957)
1905617 1905617 1905617 TABLE ACCESS FULL CLIENT PARTITION: 1 4 (cr=8240323 pr=5455717 pw=0 time=2448530798 us cost=164110 size=3801914 card=1900957)
Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT MODE: ALL_ROWS
1 SORT (AGGREGATE)
1905617 PARTITION LIST (ALL) PARTITION: START=1 STOP=4
1905617 TABLE ACCESS MODE: ANALYZED (FULL) OF 'CLIENT' (TABLE)
PARTITION: START=1 STOP=4
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to CLIENT 2 0.00 0.00
Disk file operations I/O 29 0.00 0.00
direct path read 2048 0.19 9.78
db file sequential read 5178860 0.23 2241.08
resmgr:internal state change 2 0.11 0.21
SQL*Net message from CLIENT 1 0.00 0.00
</pre>
<br />
Object statistics were telling me that all reads were from table partitions.<br />
<br />
<pre class="brush: sql">
Session Objects Statistics
Object/Event % Time Seconds Calls - Time per Call -
Avg Min Max
Obj#(299564)
db file sequential read 78.1% 1,757.0600s 3,677,752 0.0005s 0.0001s 0.2333s
direct path read 0.4% 8.8314s 1,706 0.0052s 0.0004s 0.1953s
resmgr:internal state change 0.0% 0.2162s 2 0.1081s 0.1000s 0.1162s
Disk file operations I/O 0.0% 0.0014s 23 0.0001s 0.0000s 0.0002s
Obj#(299565)
db file sequential read 20.5% 462.5006s 1,416,370 0.0003s 0.0001s 0.1794s
direct path read 0.0% 0.8966s 304 0.0029s 0.0001s 0.0479s
Disk file operations I/O 0.0% 0.0003s 6 0.0000s 0.0000s 0.0000s
Obj#(299566)
db file sequential read 1.0% 21.5203s 84,738 0.0003s 0.0001s 0.0552s
direct path read 0.0% 0.0587s 38 0.0015s 0.0000s 0.0206s
</pre>
<br />
<br />
Hm… why am I having so many db file sequential reads with direct path reads happening also?<br />
This is a table with lots of columns so I might have problems with chained or migrated rows.<br />
Oracle is probably using individual block reads to fetch pieces of each row.<br />
<br />
As I had table with more than 255 columns I would expect intra-block chaining, but this shouldn't cause sequential reads. Only if row doesn’t fit in the block I would have regular row chaining.<br />
I’m probably having problems with row migrations.<br />
<br />
Chained row is a row that is too large to fit into a block and if this is the root cause of the problem there isn't much I can do to improve performance. If it’s too big to fit into a block, it would be too big after rebuilding table also.<br />
<br />
Migration of an row occurs when row is updated in a block and amount of free space in the block is not adequate to store all the row’s data. Row is migrated to another physical block.<br />
This usually happens when you have PCTFREE set to low.<br />
<br />
What is important for migrated rows - you can improve performance reorganizing table/partition or simply deleting/inserting chained rows.<br />
<br />
Tanel wrote blog post on the subject "<a href="http://blog.tanelpoder.com/2009/11/04/detect-chained-and-migrated-rows-in-oracle/">Detect chained and migrated rows in Oracle – Part 1</a>” and I’ve decided to use his great tool Snapper to get some diagnostic info.<br />
<br />
<pre class="brush: text">
SQL> @sn 60 6596
@snapper all 60 1 "6596"
Sampling SID 6596 with interval 60 seconds, taking 1 snapshots...
-- Session Snapper v4.06 BETA - by Tanel Poder ( http://blog.tanelpoder.com ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SID, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH , NUM_WAITS, WAITS/SEC, AVERAGES
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
6596, MSUTIC , STAT, session logical reads , 283813, 4.74k, , , , , ~ per execution
6596, MSUTIC , STAT, user I/O wait time , 5719, 95.46, , , , , ~ per execution
6596, MSUTIC , STAT, non-idle wait time , 5719, 95.46, , , , , ~ per execution
6596, MSUTIC , STAT, non-idle wait count , 193388, 3.23k, , , , , ~ per execution
6596, MSUTIC , STAT, session pga memory , -8400, -140.21, , , , , ~ per execution
6596, MSUTIC , STAT, enqueue requests , 2, .03, , , , , ~ per execution
6596, MSUTIC , STAT, enqueue releases , 2, .03, , , , , ~ per execution
6596, MSUTIC , STAT, physical read total IO requests , 193740, 3.23k, , , , , ~ per execution
6596, MSUTIC , STAT, physical read total multi block requests , 353, 5.89, , , , , ~ per execution
6596, MSUTIC , STAT, physical read total bytes , 1630494720, 27.21M, , , , , ~ per execution
6596, MSUTIC , STAT, cell physical IO interconnect bytes , 1630494720, 27.21M, , , , , ~ per execution
6596, MSUTIC , STAT, consistent gets , 283812, 4.74k, , , , , ~ per execution
6596, MSUTIC , STAT, consistent gets direct , 283810, 4.74k, , , , , ~ per execution
6596, MSUTIC , STAT, physical reads , 199034, 3.32k, , , , , ~ per execution
6596, MSUTIC , STAT, physical reads direct , 199034, 3.32k, , , , , ~ per execution
6596, MSUTIC , STAT, physical read IO requests , 193739, 3.23k, , , , , ~ per execution
6596, MSUTIC , STAT, physical read bytes , 1630486528, 27.21M, , , , , ~ per execution
6596, MSUTIC , STAT, file io wait time , 57195780, 954.66k, , , , , ~ per execution
6596, MSUTIC , STAT, Number of read IOs issued , 353, 5.89, , , , , ~ per execution
6596, MSUTIC , STAT, no work - consistent read gets , 283808, 4.74k, , , , , ~ per execution
6596, MSUTIC , STAT, table scan rows gotten , 2881106, 48.09k, , , , , ~ per execution
6596, MSUTIC , STAT, table scan blocks gotten , 83578, 1.4k, , , , , ~ per execution
6596, MSUTIC , STAT, table fetch continued row , 200188, 3.34k, , , , , ~ per execution
6596, MSUTIC , STAT, buffer is not pinned count , 200226, 3.34k, , , , , ~ per execution
6596, MSUTIC , TIME, DB CPU , 5620720, 93.82ms, 9.4%, [@ ], , ,
6596, MSUTIC , TIME, sql execute elapsed time , 60270147, 1.01s, 100.6%, [##########], , ,
6596, MSUTIC , TIME, DB time , 60270147, 1.01s, 100.6%, [##########], , , ~ unaccounted time
6596, MSUTIC , WAIT, Disk file operations I/O , 123, 2.05us, .0%, [ ], 2, .03, 61.5us average wait
6596, MSUTIC , WAIT, db file sequential read , 57234629, 955.31ms, 95.5%, [WWWWWWWWWW], 192888, 3.22k, 296.72us average wait
-- End of Stats snap 1, end=2016-02-23 13:23:19, seconds=59.9
----------------------------------------------------------------------------------------------------
Active% | INST | SQL_ID | SQL_CHILD | EVENT | WAIT_CLASS
----------------------------------------------------------------------------------------------------
97% | 1 | 2q92xdvxjj712 | 0 | db file sequential read | User I/O
3% | 1 | 2q92xdvxjj712 | 0 | ON CPU | ON CPU
-- End of ASH snap 1, end=2016-02-23 13:23:19, seconds=60, samples_taken=99
PL/SQL procedure successfully completed.
</pre>
<br />
Notice "<b>table fetch continued row</b>" statistic. Tanel wrote that this counter is usually increasing when rows are accessed with index access paths. <br />
In my case I have full scan that is increasing the value. This count is number of chained pieces Oracle had to go through in order to find the individual pieces of the rows.<br />
I won’t go any further in detail - just check Tanel’s blog post.<br />
<br />
<br />
Let’s identify chained rows running ANALYZE command with the LIST CHAINED ROWS option. This command will collect information about each migrated or chained row.<br />
<br />
<pre class="brush: sql">
SQL> analyze table SCOTT.CLIENT list chained rows;
Table analyzed.
SQL> select count(*) from chained_rows;
COUNT(*)
----------
2007045
SQL> select partition_name, count(*) from chained_rows group by partition_name;
PARTITION_NAME COUNT(*)
------------------------------ ----------
CLIENT_CITY 1411813
CLIENT_CNTR 552873
CLIENT_STRNG 42359
</pre>
<br />
Table with 2097647 rows has <b>2007045 chained/migrated rows</b>. This was causing so many reads for simple full scan of the small table.<br />
<br />
<br />
I have decided to rebuild table partitions without changing PCTFREE parameter to fit migrated rows into a single block.<br />
<br />
<br />
After rebuild number of chained rows decreased.<br />
<br />
<pre class="brush: sql">
SQL> analyze table SCOTT.CLIENT list chained rows;
Table analyzed.
SQL> select count(*) from chained_rows;
COUNT(*)
----------
37883
</pre>
<br />
Now query finished in 14 secs without sequential reads happening.<br />
<pre class="brush: text">
select count(1)
from CLIENT k
where k.expires is null
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 2.34 13.96 185802 185809 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 2.34 13.96 185802 185809 0 1
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 369 (MSUTIC)
Number of plan statistics captured: 1
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
1 1 1 SORT AGGREGATE (cr=185809 pr=185802 pw=0 time=13965941 us)
1905617 1905617 1905617 PARTITION LIST ALL PARTITION: 1 4 (cr=185809 pr=185802 pw=0 time=13560379 us cost=109526 size=3811234 card=1905617)
1905617 1905617 1905617 TABLE ACCESS FULL CLIENT PARTITION: 1 4 (cr=185809 pr=185802 pw=0 time=12848619 us cost=109526 size=3811234 card=1905617)
Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT MODE: ALL_ROWS
1 SORT (AGGREGATE)
1905617 PARTITION LIST (ALL) PARTITION: START=1 STOP=4
1905617 TABLE ACCESS MODE: ANALYZED (FULL) OF 'CLIENT' (TABLE)
PARTITION: START=1 STOP=4
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to CLIENT 2 0.00 0.00
direct path read 3569 0.11 8.99
SQL*Net message from CLIENT 2 0.00 0.01
</pre>
<br />
<br />
Snapper also showed that I don’t have problem with row chaining.<br />
<pre class="brush: text">
SQL> @sn 15 6601
@snapper all 15 1 "6601"
Sampling SID 6601 with interval 15 seconds, taking 1 snapshots...
-- Session Snapper v4.06 BETA - by Tanel Poder ( http://blog.tanelpoder.com ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SID, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH , NUM_WAITS, WAITS/SEC, AVERAGES
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
6601, MSUTIC , STAT, Requests to/from CLIENT , 1, .07, , , , , ~ per execution
6601, MSUTIC , STAT, user calls , 1, .07, , , , , ~ per execution
6601, MSUTIC , STAT, pinned cursors current , -1, -.07, , , , , ~ per execution
6601, MSUTIC , STAT, session logical reads , 149590, 9.9k, , , , , ~ per execution
6601, MSUTIC , STAT, CPU used when call started , 227, 15.02, , , , , ~ per execution
6601, MSUTIC , STAT, CPU used by this session , 227, 15.02, , , , , ~ per execution
6601, MSUTIC , STAT, DB time , 1047, 69.29, , , , , ~ per execution
6601, MSUTIC , STAT, user I/O wait time , 424, 28.06, , , , , ~ per execution
6601, MSUTIC , STAT, non-idle wait time , 424, 28.06, , , , , ~ per execution
6601, MSUTIC , STAT, non-idle wait count , 3216, 212.84, , , , , ~ per execution
6601, MSUTIC , STAT, session uga memory , 135248, 8.95k, , , , , ~ per execution
6601, MSUTIC , STAT, physical read total IO requests , 9354, 619.07, , , , , ~ per execution
6601, MSUTIC , STAT, physical read total multi block requests , 9333, 617.68, , , , , ~ per execution
6601, MSUTIC , STAT, physical read total bytes , 1225228288, 81.09M, , , , , ~ per execution
6601, MSUTIC , STAT, cell physical IO interconnect bytes , 1225228288, 81.09M, , , , , ~ per execution
6601, MSUTIC , STAT, consistent gets , 149578, 9.9k, , , , , ~ per execution
6601, MSUTIC , STAT, consistent gets from cache , 5, .33, , , , , ~ per execution
6601, MSUTIC , STAT, consistent gets from cache (fastpath) , 5, .33, , , , , ~ per execution
6601, MSUTIC , STAT, consistent gets direct , 149572, 9.9k, , , , , ~ per execution
6601, MSUTIC , STAT, logical read bytes from cache , 40960, 2.71k, , , , , ~ per execution
6601, MSUTIC , STAT, physical reads , 149548, 9.9k, , , , , ~ per execution
6601, MSUTIC , STAT, physical reads direct , 149548, 9.9k, , , , , ~ per execution
6601, MSUTIC , STAT, physical read IO requests , 9353, 619.01, , , , , ~ per execution
6601, MSUTIC , STAT, physical read bytes , 1225097216, 81.08M, , , , , ~ per execution
6601, MSUTIC , STAT, calls to kcmgcs , 5, .33, , , , , ~ per execution
6601, MSUTIC , STAT, file io wait time , 304, 20.12, , , , , ~ per execution
6601, MSUTIC , STAT, total number of slots , -2, -.13, , , , , ~ per execution
6601, MSUTIC , STAT, Effective IO time , 4239980, 280.61k, , , , , ~ per execution
6601, MSUTIC , STAT, Number of read IOs issued , 9354, 619.07, , , , , ~ per execution
6601, MSUTIC , STAT, no work - consistent read gets , 149564, 9.9k, , , , , ~ per execution
6601, MSUTIC , STAT, Cached Commit SCN referenced , 149132, 9.87k, , , , , ~ per execution
6601, MSUTIC , STAT, table scans (cache partitions) , 3, .2, , , , , ~ per execution
6601, MSUTIC , STAT, table scans (direct read) , 3, .2, , , , , ~ per execution
6601, MSUTIC , STAT, table scan rows gotten , 3518684, 232.88k, , , , , ~ per execution
6601, MSUTIC , STAT, table scan blocks gotten , 149559, 9.9k, , , , , ~ per execution
6601, MSUTIC , STAT, bytes sent via SQL*Net to CLIENT , 211, 13.96, , , , , 105.5 bytes per roundtrip
6601, MSUTIC , STAT, bytes received via SQL*Net from CLIENT , 8, .53, , , , , ~ per execution
6601, MSUTIC , STAT, SQL*Net roundtrips to/from CLIENT , 2, .13, , , , , ~ per execution
6601, MSUTIC , TIME, DB CPU , 2000964, 132.43ms, 13.2%, [@@ ], , ,
6601, MSUTIC , TIME, sql execute elapsed time , 8500210, 562.57ms, 56.3%, [###### ], , ,
6601, MSUTIC , TIME, DB time , 8500269, 562.57ms, 56.3%, [###### ], , , 14.62s unaccounted time
6601, MSUTIC , WAIT, direct path read , 4059380, 268.66ms, 26.9%, [WWW ], 3064, 202.78, 1.32ms average wait
6601, MSUTIC , WAIT, SQL*Net message to CLIENT , 4, .26us, .0%, [ ], 1, .07, 4us average wait
6601, MSUTIC , WAIT, SQL*Net message from CLIENT , 8006127, 529.87ms, 53.0%, [WWWWWW ], 1, .07, 8.01s average wait
-- End of Stats snap 1, end=2016-02-24 08:23:59, seconds=15.1
----------------------------------------------------------------------------------------------------
Active% | INST | SQL_ID | SQL_CHILD | EVENT | WAIT_CLASS
----------------------------------------------------------------------------------------------------
29% | 1 | gg54c4j6b9jb0 | 0 | direct path read | User I/O
21% | 1 | gg54c4j6b9jb0 | 0 | ON CPU | ON CPU
-- End of ASH snap 1, end=2016-02-24 08:23:59, seconds=15, samples_taken=96
</pre>
<br />
<br />
Reorganizing table solved my problem. Finally full scans on the table were running much faster.<br />
<br />
There is interesting support note "Doc ID 238519.1" which states that trailing NULLs do not take space in the rowpiece: initially row fits in one rowpiece.<br />
If column beyond 255 is then populated, then all the NULL columns between last populated and this new column now takes up space.<br />
Row has to be split into two rowpieces and the new rowpiece is migrated to a new block - <b>row becomes chained</b>.<br />
<br />
In our table we have trailing NULL columns so this probably caused such migration.<br />
<br />
<br />
Unfortunately I don’t have time to perform detailed investigation.
<br />
<br />
<br />
<br />
<br />
<b>REFERENCES</b><br />
<a href="http://blog.tanelpoder.com/2009/11/04/detect-chained-and-migrated-rows-in-oracle/">http://blog.tanelpoder.com/2009/11/04/detect-chained-and-migrated-rows-in-oracle/</a><br />
Updating a Row with More Than 255 Columns Causes Row Chaining (Doc ID 238519.1)<br />
<br />
</span>
Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com5tag:blogger.com,1999:blog-2530682427657016426.post-55217765283270640402016-02-20T10:22:00.003+01:002020-11-10T07:50:39.002+01:00Detecting Soft Corruption in 12c - V$NONLOGGED_BLOCK, ORA-01578/ORA-26040Last week we have created standby database in our dev environment and performed some ETL actions on primary side. Loading data or rebuilding indexes was performed with NOLOGGING option. After few days we noticed lots ORA-01578/ORA-26040 errors.<br />
Corruption happened because we forgot to enable force logging.<br />
<br />
As this was new dev database there wasn’t backup, but maybe not everything was lost. If only corrupted segments are indexes we could easily rebuild them.<br />
<br />
Then I’ve learnt something new.<br />
After performing validation check logical, we noticed lots corrupted blocks, but I was puzzled why do I have “v$database_block_corruption” view empty. Then my colleague told me that Oracle changed behaviour in reporting soft corrupted blocks in 12c version (we were using 12.1.0.2). New view was updated - <b>V$NONLOGGED_BLOCK</b>.<br />
<br />
So I have created little demo case on how to detect (and repair) soft corrupted blocks on 12c database.<br />
<br />
<br />
<br />
<span id="fullpost">
Create tablespace and small table.<br />
<pre class="brush: sql">
SQL> create tablespace DEMO1 datafile '/oradata1/data/ora12c/demo01.dbf' size 50M;
Tablespace created.
SQL> create table objects tablespace DEMO as select * from dba_objects;
Table created.
SQL> alter table objects add constraint pk_obj primary key (object_id);
Table altered.
SQL> create index idx_obj_name on objects(object_name) tablespace demo1;
Index created.
</pre>
<br />
Backup tablespace.<br />
<pre class="brush: sql">
RMAN> backup tablespace DEMO1;
Starting backup at 23-AUG-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=50 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00002 name=/oradata1/data/ora12c/demo01.dbf
channel ORA_DISK_1: starting piece 1 at 23-AUG-15
channel ORA_DISK_1: finished piece 1 at 23-AUG-15
piece handle=/oradata1/fra/ORA12C/backupset/2015_08_23/o1_mf_nnndf_TAG20150823T060639_bxlkpj3j_.bkp tag=TAG20150823T060639 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 23-AUG-15
Starting Control File and SPFILE Autobackup at 23-AUG-15
piece handle=/oradata1/fra/ORA12C/autobackup/2015_08_23/o1_mf_s_888473201_bxlkpktg_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 23-AUG-15
</pre>
<br />
Rebuild index with NOLOGGING option to simulate soft corruption later.<br />
<pre class="brush: sql">
RMAN> alter index idx_obj_name rebuild nologging;
Statement processed
</pre>
<br />
Confirm that we have datafiles that require backup because they have been affected with NOLOGGING operation.<br />
<pre class="brush: sql">
RMAN> report unrecoverable;
Report of files that need backup due to unrecoverable operations
File Type of Backup Required Name
---- ----------------------- -----------------------------------
2 full or incremental /oradata1/data/ora12c/demo01.dbf
5 full or incremental /oradata1/data/ora12c/example01.dbf
</pre>
<br />
Simulate corruption.
<pre class="brush: sql">
RMAN> alter database datafile 2 offline;
Statement processed
RMAN> restore datafile 2;
Starting restore at 23-AUG-15
using channel ORA_DISK_1
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00002 to /oradata1/data/ora12c/demo01.dbf
channel ORA_DISK_1: reading from backup piece /oradata1/fra/ORA12C/backupset/2015_08_23/o1_mf_nnndf_TAG20150823T060639_bxlkpj3j_.bkp
channel ORA_DISK_1: piece handle=/oradata1/fra/ORA12C/backupset/2015_08_23/o1_mf_nnndf_TAG20150823T060639_bxlkpj3j_.bkp tag=TAG20150823T060639
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03
Finished restore at 23-AUG-15
RMAN> recover datafile 2;
Starting recover at 23-AUG-15
using channel ORA_DISK_1
starting media recovery
media recovery complete, elapsed time: 00:00:01
Finished recover at 23-AUG-15
RMAN> alter database datafile 2 online;
Statement processed
</pre>
<br />
Query table with corrupted index and notice error.<br />
<pre class="brush: sql">
SQL> select count(*) from objects where object_name like 'A%';
select count(*) from objects where object_name like 'A%'
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 2, block # 2617)
ORA-01110: data file 2: '/oradata1/data/ora12c/demo01.dbf'
ORA-26040: Data block was loaded using the NOLOGGING option
</pre>
<br />
Let’s perform validation of datafile to check block corruption.<br />
<pre class="brush: sql">
RMAN> backup validate check logical datafile 2;
Starting backup at 23-AUG-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=40 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00002 name=/oradata1/data/ora12c/demo01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
2 OK 460 129 6401 1776280
File Name: /oradata1/data/ora12c/demo01.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 0 1537
Index 0 462
Other 0 4272
Finished backup at 23-AUG-15
</pre>
<br />
Notice that we have 460 blocks marked corrupt but v$database_block_corruption view is empty.<br />
<pre class="brush: sql">
SQL> select count(*) from v$database_block_corruption;
COUNT(*)
----------
0
</pre>
<br />
Let’s query v$nonlogged_block view.
<pre class="brush: sql">
SQL> set lines 200
SQL> set pages 999
SQL> select file#, block#, blocks,object#,reason from v$nonlogged_block;
FILE# BLOCK# BLOCKS OBJECT# REASON
---------- ---------- ---------- ---------------------------------------- -------
2 2308 12 UNKNOWN
2 2321 15 UNKNOWN
2 2337 15 UNKNOWN
2 2353 15 UNKNOWN
2 2369 15 UNKNOWN
2 2385 15 UNKNOWN
2 2401 15 UNKNOWN
2 2417 15 UNKNOWN
2 2434 126 UNKNOWN
2 2562 126 UNKNOWN
2 2690 91 UNKNOWN
11 rows selected.
</pre>
<br />
<br />
Will RMAN detect that we have corrupted blocks?<br />
<pre class="brush: sql">
RMAN> backup datafile 2;
Starting backup at 23-AUG-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=54 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00002 name=/oradata1/data/ora12c/demo01.dbf
channel ORA_DISK_1: starting piece 1 at 23-AUG-15
channel ORA_DISK_1: finished piece 1 at 23-AUG-15
piece handle=/oradata1/fra/ORA12C/backupset/2015_08_23/o1_mf_nnndf_TAG20150823T061602_bxll8275_.bkp tag=TAG20150823T061602 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 23-AUG-15
</pre>
RMAN backup won’t fail due to NOLOGGING corrupt blocks and our backup will contain soft corrupted blocks.<br />
<br />
Let’s Identify corrupt segments using v$nonlogged_block view.<br />
<pre class="brush: sql">
set lines 2000
set pages 9999
col owner for a20
col partition_name for a10
col segment_name for a20
SELECT e.owner, e.segment_type, e.segment_name, e.partition_name, c.file#
, greatest(e.block_id, c.block#) corr_start_block#
, least(e.block_id+e.blocks-1, c.block#+c.blocks-1) corr_end_block#
, least(e.block_id+e.blocks-1, c.block#+c.blocks-1)
- greatest(e.block_id, c.block#) + 1 blocks_corrupted
FROM dba_extents e, V$NONLOGGED_BLOCK c
WHERE e.file_id = c.file#
AND e.block_id <= c.block# + c.blocks - 1
AND e.block_id + e.blocks - 1 >= c.block#
UNION
SELECT s.owner, s.segment_type, s.segment_name, s.partition_name, c.file#
, header_block corr_start_block#
, header_block corr_end_block#
, 1 blocks_corrupted
FROM dba_segments s, V$NONLOGGED_BLOCK c
WHERE s.header_file = c.file#
AND s.header_block between c.block# and c.block# + c.blocks - 1
UNION
SELECT null owner, null segment_type, null segment_name, null partition_name, c.file#
, greatest(f.block_id, c.block#) corr_start_block#
, least(f.block_id+f.blocks-1, c.block#+c.blocks-1) corr_end_block#
, least(f.block_id+f.blocks-1, c.block#+c.blocks-1)
- greatest(f.block_id, c.block#) + 1 blocks_corrupted
FROM dba_free_space f, V$NONLOGGED_BLOCK c
WHERE f.file_id = c.file#
AND f.block_id <= c.block# + c.blocks - 1
AND f.block_id + f.blocks - 1 >= c.block#
order by file#, corr_start_block#
/
OWNER SEGMENT_TYPE SEGMENT_NAME PARTITION_ FILE# CORR_START_BLOCK# CORR_END_BLOCK# BLOCKS_CORRUPTED
-------------------- ------------------ -------------------- ---------- ---------- ----------------- --------------- ----------------
SYS INDEX IDX_OBJ_NAME 2 2308 2311 4
SYS INDEX IDX_OBJ_NAME 2 2312 2319 8
SYS INDEX IDX_OBJ_NAME 2 2321 2327 7
SYS INDEX IDX_OBJ_NAME 2 2328 2335 8
SYS INDEX IDX_OBJ_NAME 2 2337 2343 7
SYS INDEX IDX_OBJ_NAME 2 2344 2351 8
SYS INDEX IDX_OBJ_NAME 2 2353 2359 7
SYS INDEX IDX_OBJ_NAME 2 2360 2367 8
SYS INDEX IDX_OBJ_NAME 2 2369 2375 7
SYS INDEX IDX_OBJ_NAME 2 2376 2383 8
SYS INDEX IDX_OBJ_NAME 2 2385 2391 7
SYS INDEX IDX_OBJ_NAME 2 2392 2399 8
SYS INDEX IDX_OBJ_NAME 2 2401 2407 7
SYS INDEX IDX_OBJ_NAME 2 2408 2415 8
SYS INDEX IDX_OBJ_NAME 2 2417 2423 7
SYS INDEX IDX_OBJ_NAME 2 2424 2431 8
SYS INDEX IDX_OBJ_NAME 2 2434 2559 126
SYS INDEX IDX_OBJ_NAME 2 2562 2687 126
SYS INDEX IDX_OBJ_NAME 2 2690 2780 91
19 rows selected.
</pre>
<br />
This is the best outcome to get if you notice corruption errors. All errors are related to index corruption so we could fix this problem rebuilding index.<br />
<br />
<pre class="brush: sql">
SQL> alter index idx_obj_name rebuild;
alter index idx_obj_name rebuild
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 2, block # 2308)
ORA-01110: data file 2: '/oradata1/data/ora12c/demo01.dbf'
ORA-26040: Data block was loaded using the NOLOGGING option
</pre>
<br />
Simply issuing "alter index rebuild" command won't work.<br />
We should mark index unusable to drop segment before rebuilding it or just rebuild index with online option.<br />
<br />
It is better choice to mark index unusable because you don't need additional space then, but I will simply rebuild index with online option and see what will happen.<br />
<pre class="brush: sql">
SQL> alter index idx_obj_name rebuild online;
Index altered.
SQL> select count(*) from objects where object_name like 'A%';
COUNT(*)
----------
2079
</pre>
<br />
No errors... but, let's validate datafile for corruption.
<pre class="brush: sql">
RMAN> backup validate check logical datafile 2;
Starting backup at 23-AUG-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=40 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00002 name=/oradata1/data/ora12c/demo01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
2 OK 460 94 6402 1779294
File Name: /oradata1/data/ora12c/demo01.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 0 1537
Index 0 587
Other 0 4182
Finished backup at 23-AUG-15
</pre>
Notice "Marked Corrupt" column. Hm... 460 like before.<br />
<br />
Don't worry, this is not new corruption. These are FREE blocks which will be reused and Oracle will automatically re-format those blocks.<br />
<pre class="brush: sql">
set lines 2000
set pages 9999
col owner for a20
col partition_name for a10
col segment_name for a20
SELECT e.owner, e.segment_type, e.segment_name, e.partition_name, c.file#
, greatest(e.block_id, c.block#) corr_start_block#
, least(e.block_id+e.blocks-1, c.block#+c.blocks-1) corr_end_block#
, least(e.block_id+e.blocks-1, c.block#+c.blocks-1)
- greatest(e.block_id, c.block#) + 1 blocks_corrupted
FROM dba_extents e, V$NONLOGGED_BLOCK c
WHERE e.file_id = c.file#
AND e.block_id <= c.block# + c.blocks - 1
AND e.block_id + e.blocks - 1 >= c.block#
UNION
SELECT s.owner, s.segment_type, s.segment_name, s.partition_name, c.file#
, header_block corr_start_block#
, header_block corr_end_block#
, 1 blocks_corrupted
FROM dba_segments s, V$NONLOGGED_BLOCK c
WHERE s.header_file = c.file#
AND s.header_block between c.block# and c.block# + c.blocks - 1
UNION
SELECT null owner, null segment_type, null segment_name, null partition_name, c.file#
, greatest(f.block_id, c.block#) corr_start_block#
, least(f.block_id+f.blocks-1, c.block#+c.blocks-1) corr_end_block#
, least(f.block_id+f.blocks-1, c.block#+c.blocks-1)
- greatest(f.block_id, c.block#) + 1 blocks_corrupted
FROM dba_free_space f, V$NONLOGGED_BLOCK c
WHERE f.file_id = c.file#
AND f.block_id <= c.block# + c.blocks - 1
AND f.block_id + f.blocks - 1 >= c.block#
order by file#, corr_start_block#
/
OWNER SEGMENT_TYPE SEGMENT_NAME PARTITION_ FILE# CORR_START_BLOCK# CORR_END_BLOCK# BLOCKS_CORRUPTED
-------------------- ------------------ -------------------- ---------- ---------- ----------------- --------------- ----------------
2 2308 2319 12
2 2321 2335 15
2 2337 2351 15
2 2353 2367 15
2 2369 2383 15
2 2385 2399 15
2 2401 2415 15
2 2417 2431 15
2 2434 2559 126
2 2562 2687 126
2 2690 2780 91
11 rows selected.
</pre>
<br />
We could force re-formatting creating dummy table and inserting data to dummy table.<br />
Check Doc ID 336133.1.<br />
<pre class="brush: sql">
create table s (
n number,
c varchar2(4000)
) nologging tablespace DEMO1;
SQL> BEGIN
FOR i IN 1..1000000 LOOP
INSERT /*+ APPEND */ INTO sys.s select i, lpad('REFORMAT',3092, 'R') from dual;
commit ;
END LOOP;
END;
/ 2 3 4 5 6 7
BEGIN
*
ERROR at line 1:
ORA-01653: unable to extend table SYS.S by 128 in tablespace DEMO1
ORA-06512: at line 3
SQL> drop table sys.s purge;
Table dropped.
</pre>
<br />
Notice that we don't have corrupted blocks any more.<br />
<pre class="brush: sql">
RMAN> backup validate check logical datafile 2;
Starting backup at 23-AUG-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=67 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00002 name=/oradata1/data/ora12c/demo01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
2 OK 0 3929 14593 1818933
File Name: /oradata1/data/ora12c/demo01.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 0 9851
Index 0 461
Other 0 351
Finished backup at 23-AUG-15
</pre>
<br />
<br />
<br />
Recovering corrupted index is easy, but recovering data blocks could be slightly difficult or sometimes impossible.<br />
Perform validation and backups regularly because corruption will hit you when you least expect ;)<br />
<br />
<br />
<br />
</span>Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com2tag:blogger.com,1999:blog-2530682427657016426.post-25174451513592965132015-12-17T10:27:00.003+01:002020-11-10T07:50:50.733+01:00Unindexed Foreign Keys on empty/unused table and locksIt is widely known that unindexed foreign keys can be performance issue. Unindexed foreign keys on child tables can cause table locks or performance problems in general.<br />
There are many articles on this subject so I won't go in details.<br />
<br />
My plan is to show simple demo case where empty child table with unindexed foreign key column can cause big problems.<br />
<br />
<br />
Imagine that you have highly active table (supplier) with lots DML operations from many sessions.<br />
In the meantime someone created new child table (product) in relationship with parent table (supplier). This table is empty and unused so why should you bother about indexing foreign key columns on empty table.<br />
<br />
I will show you case where this empty table can cause lock contention and serious performance issues.<br />
<br />
<span id="fullpost">
<pre class="brush: sql">
Oracle version - 11.2.0.4.0
CREATE TABLE supplier
( id number(10) not null,
supplier_id number(10) not null,
supplier_name varchar2(50) not null,
contact_name varchar2(50),
CONSTRAINT id_pk PRIMARY KEY (id),
CONSTRAINT supplier_uk UNIQUE(supplier_id)
);
INSERT INTO supplier VALUES (1,100, 'Supplier 1', 'Contact 1');
INSERT INTO supplier VALUES (2,200, 'Supplier 2', 'Contact 2');
COMMIT;
CREATE TABLE product
( product_id number(10) not null,
product_name varchar2(50) not null,
supplier_id number(10) not null,
CONSTRAINT fk_supplier
FOREIGN KEY (supplier_id)
REFERENCES supplier(supplier_id)
);
SQL> select id, supplier_id, supplier_name, contact_name from supplier;
ID SUPPLIER_ID SUPPLIER_NAME CONTACT_NAME
---------- ----------- -------------------------------------------------- ------------
1 100 Supplier 1 Contact 1
2 200 Supplier 2 Contact 2
-- Product table is empty and unused
SQL> select product_id, product_name, supplier_id from product;
no rows selected
</pre>
<br />
User from SESSION1 inserts row and waits some time to end transaction.<br />
<pre class="brush: sql">
--SESSION 1:
INSERT INTO supplier VALUES (3,300, 'Supplier 3', 'Contact 3'); --(Without COMMIT)
1 row created.
</pre>
<br />
In the same time there are lots sessions which are trying to update record with column used in foreign-key relationship.
All sessions are hanging and you have big problem.
<pre class="brush: sql">
--SESSION 2:
UPDATE supplier SET supplier_id=200 WHERE supplier_id = 200; --(HANG)
</pre>
<br />
Let's try another INSERT in next session:
<pre class="brush: sql">
--SESSION 3:
INSERT INTO supplier VALUES (4,400, 'Supplier 4', 'Contact 4'); --(HANG)
</pre>
<br />
Now we have inserts hanging which could lead to major problems for very active table.<br />
<br />
Check locks:<br />
<br />
<pre class="brush: sql">
SELECT l.sid, s.blocking_session blocker, s.event, l.type, l.lmode,
l.request, o.object_name, o.object_type
FROM v$lock l, dba_objects o, v$session s
WHERE UPPER(s.username) = UPPER('MSUTIC')
AND l.id1 = o.object_id (+)
AND l.sid = s.sid
ORDER BY sid, type;
SID BLOCKER EVENT TY LMODE REQUEST OBJECT_NAME OBJECT_TYPE
---------- ---------- -------------------------------------- -- ---------- ---------- -------------------------- ------------
63 1641 enq: TM - contention AE 4 0 ORA$BASE EDITION
63 1641 enq: TM - contention TM 3 0 SUPPLIER TABLE
63 1641 enq: TM - contention TM 0 4 PRODUCT TABLE
1390 SQL*Net message to client AE 4 0 ORA$BASE EDITION
1641 SQL*Net message from client AE 4 0 ORA$BASE EDITION
1641 SQL*Net message from client TM 3 0 SUPPLIER TABLE
1641 SQL*Net message from client TM 3 0 PRODUCT TABLE
1641 SQL*Net message from client TX 6 0 TPT SYNONYM
2159 SQL*Net message from client AE 4 0 ORA$BASE EDITION
2729 63 enq: TM - contention AE 4 0 ORA$BASE EDITION
2729 63 enq: TM - contention TM 0 3 PRODUCT TABLE
2729 63 enq: TM - contention TM 3 0 SUPPLIER TABLE
</pre>
<br />
<br />
Unused and empty product table is culprit for performance issues.<br />
<br />
<br />
Create index on foreign key column and check behaviour.<br />
<br />
<pre class="brush: sql">
CREATE INDEX fk_supplier ON product (supplier_id);
</pre>
<br />
<pre class="brush: sql">
--SESSION 1:
INSERT INTO supplier VALUES (3,300, 'Supplier 3', 'Contact 3');
1 row created.
--SESSION 2:
UPDATE supplier SET supplier_id=200 WHERE supplier_id = 200;
1 row updated.
</pre>
<br />
Now everything worked without locking problems.<br />
<br />
<br />
<br />
Notice that we have different behaviour in 12c version.<br />
<br />
<pre class="brush: sql">
Oracle version - 12.1.0.2.0
CREATE TABLE supplier
( supplier_id number(10) not null,
supplier_name varchar2(50) not null,
contact_name varchar2(50),
CONSTRAINT supplier_pk PRIMARY KEY (supplier_id)
);
INSERT INTO supplier VALUES (1, 'Supplier 1', 'Contact 1');
INSERT INTO supplier VALUES (2, 'Supplier 2', 'Contact 2');
COMMIT;
CREATE TABLE product
( product_id number(10) not null,
product_name varchar2(50) not null,
supplier_id number(10) not null,
CONSTRAINT fk_supplier
FOREIGN KEY (supplier_id)
REFERENCES supplier(supplier_id)
);
--SESSION 1:
INSERT INTO supplier VALUES (3, 'Supplier 3', 'Contact 3'); --(Without COMMIT)
1 row created.
--SESSION 2:
UPDATE supplier SET supplier_id=2 WHERE supplier_id = 2; -- (No HANG)
1 row updated.
</pre>
<br />
Check locks:<br />
<br />
<pre class="brush: sql">
SELECT l.sid, s.blocking_session blocker, s.event, l.type, l.lmode,
l.request, o.object_name, o.object_type
FROM v$lock l, dba_objects o, v$session s
WHERE UPPER(s.username) = UPPER('MSUTIC')
AND l.id1 = o.object_id (+)
AND l.sid = s.sid
ORDER BY sid, type;
SID BLOCKER EVENT TY LMODE REQUEST OBJECT_NAME
------ ---------- ------------------------------ -- ---------- ---------- ------------
4500 SQL*Net message from client AE 4 0 ORA$BASE
4500 SQL*Net message from client TM 3 0 SUPPLIER
4500 SQL*Net message from client TX 6 0
6139 SQL*Net message to client AE 4 0 ORA$BASE
6144 SQL*Net message from client AE 4 0 ORA$BASE
6144 SQL*Net message from client TM 3 0 SUPPLIER
6144 SQL*Net message from client TM 2 0 PRODUCT
6144 SQL*Net message from client TX 6 0
</pre>
<br />
<br />
<br />
I don't think that you should index all foreign keys all the time. Sometimes this is not needed and it could be overhead.
Unnecessary indexes on foreign keys are wasting storage space and cause slower DML operations on the table.<br />
<br />
Think about application and how parent/child tables will be used before creating indexes and check articles from Tom Kyte on this subject.<br />
<br />
<br />
<br />
<br />
<br />
<b>Update 2016-07-08:</b><br />
<br />
<br />
Oracle version - 11.2.0.4.0<br />
<br />
What if we index column using descending order.<br />
<br />
<pre class="brush: sql">
CREATE INDEX fk_supplier ON product (SUPPLIER_ID DESC);
Index created.
</pre>
<br />
<pre class="brush: sql">
--SESSION 1:
INSERT INTO supplier VALUES (3,300, 'Supplier 3', 'Contact 3'); --(Without COMMIT)
--SESSION 2:
UPDATE supplier SET supplier_id=200 WHERE supplier_id = 200; --(HANG)
--Try another INSERT in next session:
--SESSION 3:
INSERT INTO supplier VALUES (4,400, 'Supplier 4', 'Contact 4'); --(HANG)
</pre>
<br />
Check locks:<br />
<br />
<pre class="brush: sql">
SELECT l.sid, s.blocking_session blocker, s.event, l.type, l.lmode,
l.request, o.object_name, o.object_type
FROM v$lock l, dba_objects o, v$session s
WHERE UPPER(s.username) = UPPER('MSUTIC')
AND l.id1 = o.object_id (+)
AND l.sid = s.sid
ORDER BY sid, type;
SID BLOCKER EVENT TY LMODE REQUEST OBJECT_NAME OBJECT_TYPE
------ ---------- ------------------------------ -- ---------- ---------- ------------- -----------
192 1137 enq: TM - contention AE 4 0 ORA$BASE EDITION
192 1137 enq: TM - contention TM 3 0 SUPPLIER TABLE
192 1137 enq: TM - contention TM 0 3 PRODUCT TABLE
382 SQL*Net message from client AE 4 0 ORA$BASE EDITION
949 SQL*Net message from client AE 4 0 ORA$BASE EDITION
949 SQL*Net message from client TM 3 0 SUPPLIER TABLE
949 SQL*Net message from client TM 3 0 PRODUCT TABLE
949 SQL*Net message from client TX 6 0
1137 949 enq: TM - contention AE 4 0 ORA$BASE EDITION
1137 949 enq: TM - contention TM 3 0 SUPPLIER TABLE
1137 949 enq: TM - contention TM 0 4 PRODUCT TABLE
1516 SQL*Net message to client AE 4 0 ORA$BASE EDITION
2459 SQL*Net message from client AE 4 0 ORA$BASE EDITION
</pre>
<br />
<br />
Keep in mind - using descending order for the column to create index will not solve problem with concurrency.<br />
<br />
</span>
Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com1tag:blogger.com,1999:blog-2530682427657016426.post-85898601308023151812015-10-07T22:52:00.000+02:002015-10-07T23:04:48.002+02:00Confusion and problems with lost+found directory in MySQL/Galera cluster configurationThe <i>lost+found</i> directory is filesystem directory created at root level of mounted drive for ext file systems. It is used by file system check tools (fsck) for file recoveries.<br />
<br />
In MySql world it can cause confusion or possible problems with synchronisation in Galera cluster configuration.<br />
<br />
<span id="fullpost">
Let’s check some examples.<br />
<br />
I have MySQL database with <i>datadir=/data</i> in configuration file. I have deleted <i>lost+found</i> directory and restarted MySQL service.<br />
<br />
When I list my databases this is result:<br />
<pre class="brush: text">
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| employees |
| mysql |
| performance_schema |
| pitrdb |
| sbtest |
| sys |
| test |
+--------------------+
8 rows in set (0.34 sec)
</pre>
<br />
I will stop MySQL service and recreate <i>lost+found</i> directory.<br />
<pre class="brush: text">
$ sudo service mysql stop
$ cd /data
$ sudo mklost+found
mklost+found 1.42.9 (4-Feb-2014)
</pre>
<br />
Restart service and show databases.<br />
<pre class="brush: text">
$ sudo service mysql start
mysql> show databases;
+---------------------+
| Database |
+---------------------+
| information_schema |
| employees |
| #mysql50#lost+found |
| mysql |
| performance_schema |
| pitrdb |
| sbtest |
| sys |
| test |
+---------------------+
9 rows in set (0.01 sec)
</pre>
<br />
Notice database : <b>#mysql50#lost+found</b><br />
<br />
If you have dedicated entire FS to use as MySQL datadir then MySQL will interpret all files under that directory as db-related files.<br />
SHOW DATABASES lists database <i>lost+found</i> which is not real database. <br />
<br />
If you check error log you can notice this message:<br />
<pre class="brush: text">
[ERROR] Invalid (old?) table or database name 'lost+found'
</pre>
<br />
For a single server configuration issues with <i>lost+found</i> directory can only make confusion. I’m not aware of any negative effects for database.<br />
To avoid confusion you should move database to sub-directory below the root level directory. Also remove all directories that are not MySql db-related from datadir location.<br />
<br />
<br />
Stop MySQL service on database server.<br />
<pre class="brush: text">
$ sudo service mysql stop
</pre>
<br />
Make sub-directory and move existing data to new directory.<br />
<pre class="brush: text">
$ sudo su -
root@galera1:~# cd /data
root@galera1:/data# mkdir mydata && mv !(mydata) mydata
root@galera1:/data# chown -R mysql:mysql /data
</pre>
<br />
Update configuration file with new datadir location.<br />
<pre class="brush: text">
# vi /etc/mysql/my.cnf
...
datadir=/data/mydata
...
</pre>
<br />
Remove non-database directories.<br />
<pre class="brush: text">
# rm -rf mydata/lost+found
# mklost+found
mklost+found 1.42.9 (4-Feb-2014)
# pwd
/data
# ls -l
total 56
drwx------ 2 root root 49152 Oct 4 16:48 lost+found
drwxr-xr-x 9 mysql mysql 4096 Oct 4 16:48 mydata
</pre>
<br />
Restart the service.<br />
<pre class="brush: text">
$ sudo service mysql start
</pre>
<br />
<br />
From 5.6 version you can tell server to ignore non-database directories using <b>ignore-db-dir</b> option.<br />
<pre class="brush: text">
$ sudo vi /etc/mysql/my.cnf
...
ignore-db-dir=lost+found
...
</pre>
<br />
<br />
<br />
Let’s test how <i>lost+found</i> directory affects Galera cluster configuration.<br />
For this test I’m using Percona XtraDB Cluster 5.6 with 3 nodes.<br />
<br />
<pre class="brush: text">
# dpkg -l | grep percona-xtradb-cluster-server
ii percona-xtradb-cluster-server-5.6 5.6.25-25.12-1.trusty amd64 Percona XtraDB Cluster database server binaries
mysql> select version();
+--------------------+
| version() |
+--------------------+
| 5.6.25-73.1-56-log |
+--------------------+
1 row in set (0.00 sec)
mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
1 row in set (0.01 sec)
</pre>
<br />
In this configuration for datadir is specified /data location with <i>lost+found</i> directory. <br />
As this is 5.6 version I’ve included <i>ignore-db-dir</i> option in configuration file.<br />
<br />
In SHOW DATABASES list and error log I don’t see any issues.<br />
<pre class="brush: text">
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| employees |
| mysql |
| performance_schema |
| pitrdb |
| sbtest |
| sys |
| test |
+--------------------+
8 rows in set (0.00 sec)
</pre>
<br />
For SST method I’m using default and recommended Percona’s xtrabackup-v2.<br />
So, what will happen if I initiate SST for one of the nodes in the cluster.<br />
<br />
<pre class="brush: text">
$ sudo service mysql stop
* Stopping MySQL (Percona XtraDB Cluster) mysqld [OK]
$ sudo rm /data/grastate.dat
$ sudo service mysql start
[sudo] password for marko:
* Starting MySQL (Percona XtraDB Cluster) database server mysqld
* State transfer in progress, setting sleep higher mysqld
* The server quit without updating PID file (/data/galera2.pid).
</pre>
<br />
It appears that SST failed with errors:<br />
<br />
<pre class="brush: text">
WSREP_SST: [ERROR] Cleanup after exit with status:1 (20151004 12:01:00.936)
2015-10-04 12:01:02 16136 [Note] WSREP: (cf98f684, 'tcp://0.0.0.0:4567') turning message relay requesting off
2015-10-04 12:01:12 16136 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.56.102' --datadir '/data/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '16136' --binlog 'percona-bin' : 1 (Operation not permitted)
2015-10-04 12:01:12 16136 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2015-10-04 12:01:12 16136 [ERROR] WSREP: SST script aborted with error 1 (Operation not permitted)
2015-10-04 12:01:12 16136 [ERROR] WSREP: SST failed: 1 (Operation not permitted)
2015-10-04 12:01:12 16136 [ERROR] Aborting
2015-10-04 12:01:12 16136 [Warning] WSREP: 0.0 (galera3): State transfer to 1.0 (galera2) failed: -22 (Invalid argument)
2015-10-04 12:01:12 16136 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():731: Will never receive state. Need to abort.
</pre>
<br />
<br />
The cause of SST failure is <i>lost+found</i> directory but in error log <i>lost+found</i> directory is not mentioned. <br />
<br />
SST fails because xtrabackup ignores <i>ignore-db-dir</i> option and tries to synchronise <i>lost+found</i> directory which is owned by root user. <br />
<br />
<br />
What will happen if I (for test) change the ownership of lost+found directory on donor nodes.<br />
<br />
<pre class="brush: text">
drwx------ 2 root root 49152 Oct 4 11:50 lost+found
marko@galera3:/data# sudo chown -R mysql:mysql /data/lost+found
marko@galera1:/data$ sudo chown -R mysql:mysql /data/lost+found
marko@galera2:/data$ sudo service mysql start
* Stale sst_in_progress file in datadir mysqld
* Starting MySQL (Percona XtraDB Cluster) database server mysqld
* State transfer in progress, setting sleep higher mysqld [OK]
NODE2
...
drwxrwx--x 2 mysql mysql 4096 Oct 4 12:07 lost+found
...
</pre>
<br />
SST succeeded and node is successfully joined/synced to the cluster.<br />
<br />
<br />
To avoid this inconveniences just move databases from root directory.<br />
Some of you will simply delete lost+found directory, but be aware, fsck may recreate lost+found directory and your cluster synchronisation will fail when you least expect it ;)<br />
<br />
<br />
</span>Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com0tag:blogger.com,1999:blog-2530682427657016426.post-26635349090334625392015-05-10T21:38:00.001+02:002020-11-10T07:51:06.157+01:00How to Pass Arguments to OS Shell Script from Oracle DatabaseImagine you have several Oracle databases on the same host under same OS user. <br />
<br />
In scripts directory you have shell script that kills OS processes.<br />
Idea is to call OS script from database procedure and kill problematic process using shell script.<br />
<br />
Script will run simple query to get process id and kill that process.<br />
<br />
But how to assure that this script will execute in correct environment for correct database?<br />
<br />
One way is to create one script per database and set environment inside the script, or create just one script which will dynamically set correct environment for instance that is calling script.<br />
<br />
<span id="fullpost">
For demo case I’ve created simple script that spools query output to the file.<br />
<br />
<pre class="brush: bash">
#!/bin/bash
# Avoid oraenv asking
ORAENV_ASK="NO"; export ORAENV_ASK
ORACLE_SID=$1; export ORACLE_SID
. oraenv ${ORACLE_SID}
$ORACLE_HOME/bin/sqlplus -s "/ as sysdba" <<EOF > /tmp/my_environment.txt
set heading off feedback off verify off
col instance_name for a10
col host_name for a10
col status for a10
select instance_name, host_name, status
from v\$instance;
exit
EOF
$ chmod u+x simple_script.sh
</pre>
<br />
<br />
What happens when we execute script.<br />
<br />
<pre class="brush: text">
$ ./simple_script.sh testdb
The Oracle base for ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1 is /u01/app/oracle
$
$ cat /tmp/my_environment.txt
testdb asterix OPEN
</pre>
<br />
<pre class="brush: text">
$ ./simple_script.sh ora11gr2
The Oracle base for ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1 is /u01/app/oracle
$
$ cat /tmp/my_environment.txt
ora11gr2 asterix OPEN
</pre>
<br />
Notice how I specified ORACLE_SID using command line argument. Script sets environment from ORATAB file according to specified SID and spools output to my_enviroment.txt file.<br />
<br />
I will demonstrate how to pass argument from database layer.<br />
<br />
<br />
To execute external job I have to create credentials on both databases.<br />
<br />
<pre class="brush: sql">
-- Session 1
system@ORA11GR2> begin
2 dbms_scheduler.create_credential(
3 credential_name => 'ORACLE_CRED',
4 username => 'oracle',
5 password => 'password');
6 end;
7 /
PL/SQL procedure successfully completed.
-- Session 2
system@TESTDB> begin
2 dbms_scheduler.create_credential(
3 credential_name => 'ORACLE_CRED',
4 username => 'oracle',
5 password => 'password');
6 end;
7 /
PL/SQL procedure successfully completed.
</pre>
<br />
<br />
Use SYS_CONTEXT function to get instance name and execute script for specified instance.<br />
<br />
<pre class="brush: sql">
-- Session 1
system@ORA11GR2> DECLARE
2 l_oracle_sid varchar2(20);
3 BEGIN
4 select sys_context('userenv','instance_name') into l_oracle_sid
5 from dual;
6 DBMS_SCHEDULER.CREATE_JOB (
7 job_name => 'J_SIMPLE_SCRIPT',
8 job_type => 'EXECUTABLE',
9 job_action => '/home/oracle/skripte/simple_script.sh',
10 number_of_arguments => 1,
11 start_date => NULL,
12 repeat_interval => NULL,
13 end_date => NULL,
14 enabled => FALSE,
15 auto_drop => TRUE,
16 comments => 'Set environment and execute query on v$instance view');
17 dbms_scheduler.set_attribute('J_SIMPLE_SCRIPT','credential_name','ORACLE_CRED');
18 DBMS_SCHEDULER.set_job_argument_value('J_SIMPLE_SCRIPT',1,l_oracle_sid);
19 DBMS_SCHEDULER.enable('J_SIMPLE_SCRIPT');
20 DBMS_SCHEDULER.run_job (job_name=> 'J_SIMPLE_SCRIPT', use_current_session => FALSE);
21 END;
22 /
PL/SQL procedure successfully completed.
system@ORA11GR2> host cat /tmp/my_environment.txt
ora11gr2 asterix OPEN
</pre>
<br />
<br />
I’ve called script from "ora11gr2" database and OS script was executed for specified database. DBMS_SCHEDULER job was used for passing argument to external OS script and for script execution.<br />
<br />
From another session.<br />
<br />
<pre class="brush: sql">
-- Session 2
system@TESTDB> DECLARE
2 l_oracle_sid varchar2(20);
3 BEGIN
4 select sys_context('userenv','instance_name') into l_oracle_sid
5 from dual;
6 DBMS_SCHEDULER.CREATE_JOB (
7 job_name => 'J_SIMPLE_SCRIPT',
8 job_type => 'EXECUTABLE',
9 job_action => '/home/oracle/skripte/simple_script.sh',
10 number_of_arguments => 1,
11 start_date => NULL,
12 repeat_interval => NULL,
13 end_date => NULL,
14 enabled => FALSE,
15 auto_drop => TRUE,
16 comments => 'Set environment and execute query on v$instance view');
17 dbms_scheduler.set_attribute('J_SIMPLE_SCRIPT','credential_name','ORACLE_CRED');
18 DBMS_SCHEDULER.set_job_argument_value('J_SIMPLE_SCRIPT',1,l_oracle_sid);
19 DBMS_SCHEDULER.enable('J_SIMPLE_SCRIPT');
20 DBMS_SCHEDULER.run_job (job_name=> 'J_SIMPLE_SCRIPT', use_current_session => FALSE);
21 END;
22 /
PL/SQL procedure successfully completed.
SQL> host cat /tmp/my_environment.txt
testdb asterix OPEN
</pre>
<br />
Notice how "/tmp/my_environment.txt" file changed according to specified database.<br />
<br />
<br />
Using this method you can easily reuse OS scripts for more databases.<br />
<br />
<br />
</span>Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com2tag:blogger.com,1999:blog-2530682427657016426.post-27411269632832427192015-05-09T09:03:00.003+02:002020-11-10T07:51:19.078+01:00ASM not starting with ORA-00845 - how to fix ASM parameter fileFew days ago I saw great post from <a href="http://qdosmsq.dunbar-it.co.uk/blog/2015/04/how-to-fix-a-broken-asm-spfile-held-within-asm">Norman Dunbar</a> on how to fix a broken ASM spfile.<br />
<br />
With version 11gR2 ASM spfile can be stored in ASM diskgroup and by default Oracle Installer will put it there. So if you want to create pfile from spfile your ASM instance should be up and running.<br />
<br />
If you have incorrect parameter in ASM spfile which is blocking ASM to start than you have slight problem. You cannot easily create pfile from spfile, correct incorrect parameter in pfile and recreate spfile, as you would do for database.<br />
<br />
But don't worry, there are several options well explained available on net. I would recommend to practice all scenarios in you test environment if you want to avoid big stress in production later.<br />
<br />
<br />
When I had problems with broken ASM parameter file (mostly in test/dev environment), I would always end up searching my notes or blog posts on how to solve this problem.<br />
<br />
I knew that parameters were written directly in ASM disk header and I could extract them from there, or maybe check parameters in ASM alert log, but in back of my brain I was always thinking that there must be simpler way.<br />
<br />
<span id="fullpost">
Thanks to Norman now I know how to quickly change incorrect parameter and keep other parameters intact.<br />
<br />
<br />
I have used this trick few days ago and it worked perfectly. This blog post is just reminder which I know it will be useful for me in the future.<br />
<br />
<br />
<br />
In my environment I have Oracle Restart with Oracle Database <b>12.1.0.2.0</b>.<br />
<br />
After starting my test server I have noticed that something is wrong because ASM was unable to start.<br />
<pre class="brush: text">
$ ./srvctl status asm
ASM is not running.
</pre>
<br />
When I tried to start ASM manually I have received error:<br />
<pre class="brush: text">
$ ./srvctl start asm
PRCR-1079 : Failed to start resource ora.asm
CRS-5017: The resource action "ora.asm start" encountered the following error:
ORA-00845: MEMORY_TARGET not supported on this system
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/obelix/crs/trace/ohasd_oraagent_grid.trc".
CRS-2674: Start of 'ora.asm' on 'obelix' failed
</pre>
<br />
<br />
Let's check alert log.<br />
<pre class="brush: text">
alert+ASM.log
Fri May 01 19:40:16 2015
MEMORY_TARGET defaulting to 1128267776.
* instance_number obtained from CSS = 1, checking for the existence of node 0...
* node 0 does not exist. instance_number = 1
Starting ORACLE instance (normal) (OS id: 4136)
Fri May 01 19:40:16 2015
CLI notifier numLatches:3 maxDescs:222
Fri May 01 19:40:16 2015
WARNING: You are trying to use the MEMORY_TARGET feature. This feature requires the /dev/shm file system to be mounted for at least 1140850688 bytes. /dev/shm is either not mounted or is mounted with available space less than this size. Please fix this so that MEMORY_TARGET can work as expected. Current available is 1051975680 and used is 208896 bytes. Ensure that the mount point is /dev/shm for this directory.
</pre>
<br />
<br />
So there is problem with MEMORY_TARGET parameter, but how can I disable AMM when my ASM instance is down.<br />
<br />
First I had to find location of ASM parameter file. I don’t have GPnP profile as this is single instance setup so I have extracted ASM parameter file location from "ora.asm" resource information.<br />
<pre class="brush: text">
$ crsctl stat res ora.asm -p | egrep "ASM_DISKSTRING|SPFILE"
ASM_DISKSTRING= SPFILE=+DATA/ASM/ASMPARAMETERFILE/registry.253.822856169
</pre>
<br />
<br />
Create new parameter file with corrected MEMORY_TARGET parameter.<br />
<pre class="brush: text">
$ vi /tmp/initASM.ora
spfile="+DATA/asm/asmparameterfile/registry.253.862145335"
MEMORY_TARGET=0
</pre>
<br />
<br />
Start ASM instance using new parameter file.<br />
<br />
<pre class="brush: sql">
$ sqlplus / as sysasm
SQL*Plus: Release 12.1.0.2.0 Production on Fri May 1 20:04:39 2015
Copyright (c) 1982, 2014, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup pfile=/tmp/initASM.ora
ASM instance started
Total System Global Area 197132288 bytes
Fixed Size 2922520 bytes
Variable Size 169043944 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted
</pre>
<br />
And woila!<br />
New parameter was applied and I was able to start ASM instance.<br />
<br />
<br />
Change parameter in ASM spfile.<br />
<pre class="brush: sql">
SQL> alter system set memory_target=0 scope=spfile;
System altered.
</pre>
<br />
Restart ASM.<br />
<pre class="brush: sql">
SQL> shutdown immediate;
ASM diskgroups dismounted
ASM instance shutdown
</pre>
<br />
<pre class="brush: sql">
[grid@obelix bin]$ ./srvctl start asm
[grid@obelix bin]$ ./srvctl status asm
ASM is running on obelix
</pre>
<br />
<br />
ASM instance successfully started with corrected parameter file.<br />
<br />
<br />
<br />
</span>Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com2tag:blogger.com,1999:blog-2530682427657016426.post-48394819993212094962015-02-28T11:22:00.001+01:002020-11-10T07:51:29.316+01:00Restore to Restore Point on Standard Edition (no Flashback technology)Restore points and Flashback database are nice features introduced in 10g database that provide efficient point in time recovery to reverse unwanted data changes. <br />
<br />
But what if you have Standard Edition database:<br />
<br />
<pre class="brush: sql">
SQL> shutdown immediate;
SQL> startup mount;
SQL> alter database flashback on;
alter database flashback on
*
ERROR at line 1:
ORA-00439: feature not enabled: Flashback Database
</pre>
<br />
In Standard Edition you don’t have Flashback Database feature, but you can still create restore points and perform incomplete recoveries <b>to restore point</b>.<br />
<span id="fullpost">
<br />
<br />
Create test table and insert status row.<br />
<br />
<pre class="brush: text">
SQL> create table admin.test_restore (datum date, komentar varchar2(100));
Table created.
SQL> insert into admin.test_restore values (sysdate, 'Before Restore Point');
1 row created.
SQL> commit;
Commit complete.
</pre>
<br />
<br />
Create restore point here.<br />
<br />
<pre class="brush: text">
SQL> create restore point RP_UPGRADE;
Restore point created.
SQL> select scn, to_char(time,'dd.mm.yyyy hh24:mi:ss') time, name
2 from v$restore_point;
SCN TIME NAME
---------- ------------------- ---------------------
580752 27.02.2015 10:31:19 RP_UPGRADE
</pre>
<br />
Notice how name of restore point is associated with SCN of the database. <br />
<br />
<br />
Now you can perform potentially dangerous operations like database upgrades, table modifications, truncating data and like.<br />
<br />
I will enter some status data for later checks.<br />
<br />
<pre class="brush: text">
SQL> insert into admin.test_restore values (sysdate, 'After Restore Point');
1 row created.
SQL> insert into admin.test_restore values (sysdate, 'Upgrade actions performed');
1 row created.
SQL> commit;
Commit complete.
</pre>
<br />
<br />
Check table.<br />
<br />
<pre class="brush: text">
SQL> alter session set nls_date_format='dd.mm.yyyy hh24:mi:ss';
Session altered.
SQL> select datum, komentar from admin.test_restore order by datum;
DATUM KOMENTAR
------------------- ------------------------------
27.02.2015 10:30:39 Before Restore Point
27.02.2015 10:31:45 After Restore Point
27.02.2015 10:31:55 Upgrade actions performed
</pre>
<br />
<br />
Suppose we had some problems and want to "rewind" database to restore point. In EE we would perform flashback database to restore point but in SE we will use different approach.<br />
<br />
<br />
Shutdown database and startup mount.<br />
<br />
<pre class="brush: text">
RMAN> shutdown immediate;
using target database control file instead of recovery catalog
database closed
database dismounted
Oracle instance shut down
RMAN> startup mount;
connected to target database (not started)
Oracle instance started
database mounted
Total System Global Area 471830528 bytes
Fixed Size 2254344 bytes
Variable Size 247466488 bytes
Database Buffers 213909504 bytes
Redo Buffers 8200192 bytes
</pre>
<br />
<br />
Restore and recover database until restore point RP_UPGRADE.<br />
<br />
<pre class="brush: text">
RMAN> restore database until restore point RP_UPGRADE;
Starting restore at 27.02.2015 10:36:26
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=247 device type=DISK
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00001 to +DATA1/ora11gr2/datafile/system.291.872722799
channel ORA_DISK_1: restoring datafile 00002 to +DATA1/ora11gr2/datafile/sysaux.292.872722847
channel ORA_DISK_1: restoring datafile 00003 to +DATA1/ora11gr2/datafile/undotbs1.278.872722879
channel ORA_DISK_1: restoring datafile 00004 to +DATA1/ora11gr2/datafile/users.296.872722925
channel ORA_DISK_1: reading from backup piece +FRA1/ora11gr2/backupset/2015_02_27/nnndf0_tag20150227t102559_0.1164.872763961
channel ORA_DISK_1: piece handle=+FRA1/ora11gr2/backupset/2015_02_27/nnndf0_tag20150227t102559_0.1164.872763961 tag=TAG20150227T102559
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:01:35
Finished restore at 27.02.2015 10:38:02
RMAN> recover database until restore point RP_UPGRADE;
Starting recover at 27.02.2015 10:38:45
using channel ORA_DISK_1
starting media recovery
media recovery complete, elapsed time: 00:00:01
Finished recover at 27.02.2015 10:38:49
</pre>
<br />
<br />
Open database with resetlogs option.<br />
<br />
<pre class="brush: sql">
RMAN> sql 'alter database open resetlogs';
sql statement: alter database open resetlogs
</pre>
<br />
<br />
Final check.<br />
<br />
<pre class="brush: text">
SQL> alter session set nls_date_format='dd.mm.yyyy hh24:mi:ss';
Session altered.
SQL> select datum, komentar
2 from admin.test_restore
3 order by datum;
DATUM KOMENTAR
------------------- --------------------------------------------------
27.02.2015 10:30:39 Before Restore Point
</pre>
<br />
<br />
We "rewound" database to state that existed before RP_UPGRADE restore point is created.<br />
This was incomplete recovery and RP_UPGRADE restore point was used just to mark location in time.<br />
<br />
<br />
<br />
</span>Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com4tag:blogger.com,1999:blog-2530682427657016426.post-67326475090391910322015-02-05T10:33:00.001+01:002020-11-10T07:51:39.039+01:00MariaDB - Measure Replicaton Lag and Check / Fix Replication Inconsistencies using Percona toolsPercona Toolkit is collection of command-line tools to perform many MySQL tasks like creating backups, finding duplicate indexes, managing replication, etc.<br />
<br />
In this post I will talk about how to measure replication lag and check/fix replication inconsistencies with this tools: <br />
<i>pt-heartbeat</i><br />
<i>pt-table-checksum</i><br />
<i>pt-table-sync</i><br />
<br />
<br />
I am using environment from previous blog post.<br />
Master-Master replication with MariaDB 10.0.16 database on Debian 7.<br />
<br />
<br />
Install <b>Percona Toolkit</b> on both nodes:<br />
<br />
<pre class="brush: text">
$ sudo wget percona.com/get/percona-toolkit.deb
$ sudo apt-get install libterm-readkey-perl
$ sudo dpkg -i percona-toolkit.deb
</pre>
<br />
<br />
I will create <i>percona database</i> where I will store tables needed for various checks. Also I will create <i>percona user</i> which will be used with Percona tools.<br />
<br />
<br />
<span id="fullpost">
<br />
<b>MASTER1</b><br />
<br />
<pre class="brush: text">
MariaDB [(none)]> create database percona;
MariaDB [(none)]> grant all privileges on *.* to 'percona'@'master1.localdomain' identified by 'percona';
MariaDB [(none)]> grant all privileges on *.* to 'percona'@'localhost' identified by 'percona';
MariaDB [(none)]> flush privileges;
</pre>
<br />
<br />
<b>MASTER2</b><br />
<br />
<pre class="brush: text">
MariaDB [(none)]> grant all privileges on *.* to 'percona'@'master2.localdomain' identified by 'percona';
MariaDB [(none)]> grant all privileges on *.* to 'percona'@'localhost' identified by 'percona';
MariaDB [(none)]> flush privileges;
</pre>
<br />
<br />
<br />
<br />
<font color="blue"><b>MONITOR REPLICATION LAG</b></font><br />
<br />
<br />
So, I have replication running and I want to be sure that everything is working fine.<br />
Typical method to monitor replication lag would be to run <i>SLAVE STATUS</i> and look at <i>Seconds_Behind_Master</i>. But <i>Seconds_Behind_Master</i> is not always accurate.<br />
<br />
Percona Toolkit has a tool to monitor replication delay called <i>pt-heartbeat</i>.<br />
<br />
We must create heartbeat table on the master manually or using <i>--create-table</i> option and heartbeat table must contain one heartbeat row. This table will be updated in interval we specify by <i>pt-heartbeat</i>. Slave will actively check table and calculate time delay.<br />
<br />
<br />
Create heartbeat table and start daemonized process to update <i>percona.heartbeat</i> table.<br />
<br />
<br />
<b>MASTER1</b><br />
<br />
<pre class="brush: text">
$ pt-heartbeat -upercona -ppercona -D percona --update master1 --daemonize --create-table
</pre>
<br />
<br />
<b>MASTER2</b><br />
Start <i>pt-heartbeat</i>.<br />
<br />
<pre class="brush: text">
$ pt-heartbeat -upercona -ppercona --update --database percona
</pre>
<br />
<br />
<b>MASTER1</b>
<br />
Monitor replication slave lag.<br />
<br />
<pre class="brush: text">
$ pt-heartbeat -upercona -ppercona -D percona --monitor -h master2
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.01s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
</pre>
<br />
<br />
<br />
<br />
<br />
<font color="blue"><b>CHECK REPLICATION INCONSISTENCIES</b></font><br />
<br />
<br />
If we want to check replication integrity we can use <i>pt-table-checksum</i> tool.<br />
<br />
Run tool on master server. It will automatically detect slave servers and connect to them to do some safety checks. After that it runs checksums on the tables of the master database and reports results in the checksum table. This results are then compared with the results on the slave whether the data differs.<br />
You can inspect that table anytime - in this example <i>percona.checksums</i> table.<br />
<br />
If there are no different rows in the tables between master and slave database <i>DIFF </i>will show 0.<br />
<br />
<br />
<pre class="brush: text">
$ pt-table-checksum -upercona -ppercona --create-replicate-table --replicate percona.checksums --databases testdb -h master2
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
02-02T20:58:15 0 0 5 1 0 1.134 testdb.users
</pre>
<br />
<br />
<b>MASTER2</b><br />
<br />
<pre class="brush: text">
MariaDB [testdb]> create table address (id int auto_increment primary key, city varchar(30));
Query OK, 0 rows affected (0.06 sec)
MariaDB [testdb]> insert into address (city) values ('New York');
Query OK, 1 row affected (0.07 sec)
MariaDB [testdb]> insert into address (city) values ('LA');
Query OK, 1 row affected (0.06 sec)
MariaDB [testdb]> insert into address (city) values ('Zagreb');
Query OK, 1 row affected (0.13 sec)
</pre>
<br />
<br />
<b>MASTER1</b><br />
<br />
<pre class="brush: text">
$ pt-table-checksum -upercona -ppercona --replicate percona.checksums --databases testdb -h master2
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
02-02T20:59:16 0 0 3 1 0 1.032 testdb.address
02-02T20:59:17 0 0 5 1 0 1.120 testdb.users
</pre>
<br />
<br />
<pre class="brush: text">
$ pt-table-checksum -upercona -ppercona --replicate=percona.checksums --replicate-check-only --databases=testdb master1
</pre>
<br />
<br />
Nothing received in output which means that testdb database is in sync with slave.<br />
<br />
<br />
Insert some test data:<br />
<br />
<pre class="brush: text">
MariaDB [testdb]> create table animals (id int not null auto_increment,
-> name char(30) not null,
-> primary key(id));
Query OK, 0 rows affected (0.04 sec)
MariaDB [testdb]> insert into animals (name) values ('dog'),('cat'),('whale');
Query OK, 3 rows affected (0.00 sec)
Records: 3 Duplicates: 0 Warnings: 0
MariaDB [testdb]> create table countries (id int not null auto_increment,
-> name varchar(30),
-> primary key(id));
Query OK, 0 rows affected (0.09 sec)
MariaDB [testdb]> insert into countries(name) values ('Croatia'),('England'),('USA'),('Island');
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
MariaDB [testdb]> select * from animals;
+----+-------+
| id | name |
+----+-------+
| 1 | dog |
| 2 | cat |
| 3 | whale |
+----+-------+
3 rows in set (0.00 sec)
MariaDB [testdb]> select * from countries;
+----+---------+
| id | name |
+----+---------+
| 1 | Croatia |
| 2 | England |
| 3 | USA |
| 4 | Island |
+----+---------+
4 rows in set (0.00 sec)
</pre>
<br />
<br />
Check if database is in sync:<br />
<br />
<pre class="brush: text">
$ pt-table-checksum -upercona -ppercona --create-replicate-table --replicate percona.checksums --databases testdb -h master1
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
02-02T21:03:49 0 0 3 1 0 0.177 testdb.address
02-02T21:03:49 0 0 3 1 0 0.045 testdb.animals
02-02T21:03:49 0 0 4 1 0 0.049 testdb.countries
02-02T21:03:49 0 0 5 1 0 0.037 testdb.users
</pre>
<br />
<br />
<br />
<br />
<br />
<font color="blue"><b>RESYNC REPLICA FROM THE MASTER</b></font><br />
<br />
<br />
Lets make database on <i>MASTER2 </i>out-of-sync and create some differences between databases.<br />
<br />
<br />
<b>MASTER2</b><br />
<br />
Instead of stopping replication process, I will temporarily disable binary logging on <i>MASTER2 </i>server.<br />
<br />
<pre class="brush: text">
MariaDB [testdb]> SET SQL_LOG_BIN=0;
Query OK, 0 rows affected (0.00 sec)
</pre>
<br />
<br />
Make same data modifications.<br />
<br />
<pre class="brush: text">
MariaDB [testdb]> insert into animals (name) values ('Ostrich'),('Penguin');
Query OK, 2 rows affected (0.04 sec)
Records: 2 Duplicates: 0 Warnings: 0
MariaDB [testdb]> delete from countries where id=2;
Query OK, 1 row affected (0.01 sec)
MariaDB [testdb]> create table colors (name varchar(30));
Query OK, 0 rows affected (0.10 sec)
MariaDB [testdb]> insert into colors(name) values ('Red'),('Blue');
Query OK, 2 rows affected (0.02 sec)
Records: 2 Duplicates: 0 Warnings: 0
</pre>
<br />
<br />
Enable binary logging again.<br />
<br />
<pre class="brush: text">
MariaDB [testdb]> SET SQL_LOG_BIN=1;
Query OK, 0 rows affected (0.00 sec)
</pre>
<br />
<br />
<br />
<b>MASTER1</b><br />
<br />
<pre class="brush: text">
MariaDB [testdb]> select * from animals;
+----+-------+
| id | name |
+----+-------+
| 1 | dog |
| 2 | cat |
| 3 | whale |
+----+-------+
3 rows in set (0.00 sec)
MariaDB [testdb]> select * from countries;
+----+---------+
| id | name |
+----+---------+
| 1 | Croatia |
| 2 | England |
| 3 | USA |
| 4 | Island |
+----+---------+
4 rows in set (0.00 sec)
MariaDB [testdb]> show tables;
+------------------+
| Tables_in_testdb |
+------------------+
| address |
| animals |
| countries |
| users |
+------------------+
4 rows in set (0.00 sec)
</pre>
<br />
<br />
<b>MASTER2</b><br />
<br />
<pre class="brush: text">
MariaDB [testdb]> select * from animals;
+----+---------+
| id | name |
+----+---------+
| 1 | dog |
| 2 | cat |
| 3 | whale |
| 4 | Ostrich |
| 5 | Penguin |
+----+---------+
5 rows in set (0.00 sec)
MariaDB [testdb]> select * from countries;
+----+---------+
| id | name |
+----+---------+
| 1 | Croatia |
| 3 | USA |
| 4 | Island |
+----+---------+
3 rows in set (0.00 sec)
MariaDB [testdb]> show tables;
+------------------+
| Tables_in_testdb |
+------------------+
| address |
| animals |
| colors |
| countries |
| users |
+------------------+
5 rows in set (0.00 sec)
</pre>
<br />
<br />
Notice that there are some inconsistencies between databases and there isn’t any built-in tool that will notify us about that. Replication is working fine, even though replica has different data than master.<br />
<br />
With <i>pt-table-checksum</i> we will check data differences between databases.<br />
<br />
<br />
<b>MASTER1</b><br />
<br />
<pre class="brush: text">
$ pt-table-checksum -upercona -ppercona --create-replicate-table --replicate percona.checksums --databases testdb -h master1
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
02-02T21:11:23 0 0 3 1 0 0.106 testdb.address
02-02T21:11:23 0 1 3 1 0 0.053 testdb.animals
02-02T21:11:24 0 1 4 1 0 0.046 testdb.countries
02-02T21:11:24 0 0 5 1 0 0.042 testdb.users
$ pt-table-checksum -upercona -ppercona --replicate=percona.checksums --replicate-check-only --databases=testdb master1
Differences on master2
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
testdb.animals 1 2 1
testdb.countries 1 -1 1
</pre>
<br />
Notice how tool reported differences in <i>DIFFS </i>column.
<br />
<br />
<br />
Synchronizing data between servers in master-master configuration is not trivial task. You have to think about which process is changing data where and be very careful to avoid data corruption.<br />
<br />
In master-master configuration data changes are replicated between nodes and statements executed on "slave" node are replicated to the master.<br />
<br />
Maybe the best approach would be to stop replication, restore replica from backup or reclone whole server and start replication again. You can also dump only affected data with <i>mysqldump </i>and reload it.<br />
<br />
<br />
As this is my testing environment I will try to resolve differences using <i>pt-table-sync</i> tool from <i>Percona toolkit</i>.<br />
<br />
<br />
First I will use tool with <i>--print</i> option which will only display me queries that will resolve differences. I will inspect those queries before executing them on the slave server. <br />
These queries could be executed manually also.<br />
<br />
<pre class="brush: text">
$ pt-table-sync -upercona -ppercona --sync-to-master --databases testdb --transaction --lock=1 --verbose master2 --print
# Syncing h=master2,p=...,u=percona
# DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE
# 0 0 0 0 Chunk 22:13:17 22:13:17 0 testdb.address
DELETE FROM `testdb`.`animals` WHERE `id`='4' LIMIT 1 /*percona-toolkit src_db:testdb src_tbl:animals src_dsn:P=3306,h=master1,p=...,u=percona dst_db:testdb dst_tbl:animals dst_dsn:h=master2,p=...,u=percona lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:7723 user:msutic host:master1*/;
DELETE FROM `testdb`.`animals` WHERE `id`='5' LIMIT 1 /*percona-toolkit src_db:testdb src_tbl:animals src_dsn:P=3306,h=master1,p=...,u=percona dst_db:testdb dst_tbl:animals dst_dsn:h=master2,p=...,u=percona lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:7723 user:msutic host:master1*/;
# 2 0 0 0 Chunk 22:13:17 22:13:17 2 testdb.animals
REPLACE INTO `testdb`.`countries`(`id`, `name`) VALUES ('2', 'England') /*percona-toolkit src_db:testdb src_tbl:countries src_dsn:P=3306,h=master1,p=...,u=percona dst_db:testdb dst_tbl:countries dst_dsn:h=master2,p=...,u=percona lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:7723 user:msutic host:master1*/;
# 0 1 0 0 Chunk 22:13:17 22:13:17 2 testdb.countries
# 0 0 0 0 Chunk 22:13:17 22:13:17 0 testdb.users
</pre>
<br />
<br />
Set <i>--execute</i> option to execute those queries.<br />
With <i>--sync-to-master</i> option we will treat <i>MASTER2 </i>server as a slave.<br />
<br />
<br />
<pre class="brush: text">
$ pt-table-sync -upercona -ppercona --sync-to-master --databases testdb --transaction --lock=1 --verbose master2 --execute
# Syncing h=master2,p=...,u=percona
# DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE
# 0 0 0 0 Chunk 22:19:51 22:19:51 0 testdb.address
# 2 0 0 0 Chunk 22:19:51 22:19:51 2 testdb.animals
# 0 1 0 0 Chunk 22:19:51 22:19:51 2 testdb.countries
# 0 0 0 0 Chunk 22:19:51 22:19:51 0 testdb.users
</pre>
<br />
<br />
Output shows that differences are successfully resolved with two <i>DELETE </i>and one <i>REPLACE </i>operation on specified tables.<br />
<br />
Let’s run another check to verify if differences still exist.<br />
<br />
<pre class="brush: text">
$ pt-table-checksum -upercona -ppercona --create-replicate-table --replicate percona.checksums --databases testdb -h master1
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
02-02T22:21:30 0 0 3 1 0 0.549 testdb.address
02-02T22:21:30 0 0 3 1 0 0.048 testdb.animals
02-02T22:21:30 0 0 4 1 0 0.043 testdb.countries
02-02T22:21:30 0 0 5 1 0 0.049 testdb.users
</pre>
<br />
<i>DIFFS </i>columns shows only 0 which means that tables are in sync.<br />
<br />
<br />
<br />
<br />
What if I run checksums on <i>MASTER2 </i>server.<br />
<br />
<br />
<b>MASTER2</b><br />
<br />
<pre class="brush: text">
$ pt-table-checksum -upercona -ppercona --create-replicate-table --replicate percona.checksums --databases testdb -h master2
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
02-02T22:24:16 0 0 3 1 0 0.072 testdb.address
02-02T22:24:16 0 0 3 1 0 0.048 testdb.animals
02-02T22:24:16 Skipping table testdb.colors because it has problems on these replicas:
Table testdb.colors does not exist on replica master1
This can break replication. If you understand the risks, specify --no-check-slave-tables to disable this check.
02-02T22:24:16 Error checksumming table testdb.colors: DBD::mysql::db selectrow_hashref failed: Table 'testdb.colors' doesn't exist [for Statement "EXPLAIN SELECT * FROM `testdb`.`colors` WHERE 1=1"] at /usr/bin/pt-table-checksum line 6595.
02-02T22:24:16 1 0 0 0 0 0.003 testdb.colors
02-02T22:24:16 0 0 4 1 0 0.044 testdb.countries
02-02T22:24:16 0 0 5 1 0 0.043 testdb.users
</pre>
<br />
<br />
Output shows error because table <i>testdb.colors</i> exists on <i>MASTER2 </i>but not in <i>MASTER1</i>.<br />
<br />
I know that <i>MASTER1 </i>has "correct" data so I will just drop <i>testdb.colors </i>table on <i>MASTER2 </i>node.<br />
<br />
<pre class="brush: text">
MariaDB [testdb]> drop table if exists testdb.colors;
Query OK, 0 rows affected (0.05 sec)
</pre>
<br />
<br />
Run check again:<br />
<br />
<pre class="brush: text">
$ pt-table-checksum -upercona -ppercona --create-replicate-table --replicate percona.checksums --databases testdb -h master2
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
02-02T22:26:43 0 0 3 1 0 0.322 testdb.address
02-02T22:26:43 0 0 3 1 0 0.056 testdb.animals
02-02T22:26:43 0 0 4 1 0 0.050 testdb.countries
02-02T22:26:43 0 0 5 1 0 0.045 testdb.users
</pre>
<br />
<br />
Now we have synced databases.<br />
<br />
<br />
<br />
If we use <i>--quiet</i> option tool will print out row per table only if there are some differences. This is nice way to run tool from a cron job and send mail only if there is non-zero exit status.<br />
<br />
<pre class="brush: text">
$ pt-table-checksum -upercona -ppercona --create-replicate-table --replicate percona.checksums --databases testdb -h master1 --quiet
(no rows)
</pre>
<br />
<br />
<br />
<br />
<b>REFERENCES</b><br />
<a href="http://www.percona.com/doc/percona-toolkit/2.2/pt-table-sync.html">http://www.percona.com/doc/percona-toolkit/2.2/pt-table-sync.html</a><br />
<a href="http://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html">http://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html</a><br />
<a href="http://www.percona.com/software/percona-toolkit">http://www.percona.com/software/percona-toolkit</a><br />
<br />
<br />
</span>Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com0tag:blogger.com,1999:blog-2530682427657016426.post-52490769771305265882015-02-01T13:36:00.001+01:002020-11-10T07:51:46.338+01:00MariaDB(MySQL) Master-Master ReplicationThe simplest and probably most common replication method is <i>master-slave</i> replication. Basically, data is replicated from master database to the slave. In case of master database failure you must get the slave database up-to-date before failover and then promote slave to be new master.<br />
<br />
Another method is to set up replication in both directions called <i>master-master</i> replication. But you must be aware that this setup brings some potential issues as data changes are happening on both nodes. It could be problem if you have tables with auto_increment fields. If both servers are inserting or updating in the same table replication will break on one server due to “duplicate entry” error. To resolve this issue you have "<i>auto_increment_increment</i>" and "<i>auto_increment_offset</i>" settings.<br />
<br />
In my case its best to use master-master setup as active-passive replication. If we know that only one node is performing data modifications we can avoid many possible problems. In case of the failover "<i>slave</i>" could be easily promoted to a new master. Data modifications are automatically replicated to failed node when it comes back up.<br />
<br />
Of course, this simple setup is not suitable for all situations and it has it's drawbacks but luckily you have several other options at your disposal, like <a href="https://mariadb.com/kb/en/mariadb/what-is-mariadb-galera-cluster/">MariaDB Galera Cluster</a>.<br />
<br />
<br />
<br />
<span id="fullpost">
Servers setup:<br />
OS: Debian 7.8<br />
DB: MariaDB 10.0.16<br />
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<span id="fullpost"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJTRYGetGX8C98UrdAikaNM8QpBbBUB_2bEhOrFMK1vhcF-pCsv-yyHWxyQ6dZ0AyP5AdJKQK2pFV6AE6-AhHmMUqRYcgjGHECXtddbeYtq7v1fh1CchGPq9qtQTe_TUspZU885K2n97J2/s1600/ReplicationP.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJTRYGetGX8C98UrdAikaNM8QpBbBUB_2bEhOrFMK1vhcF-pCsv-yyHWxyQ6dZ0AyP5AdJKQK2pFV6AE6-AhHmMUqRYcgjGHECXtddbeYtq7v1fh1CchGPq9qtQTe_TUspZU885K2n97J2/s1600/ReplicationP.jpg" /></a></span></div>
<span id="fullpost">
<br />
<br />
<br />
Install MariaDB 10 (both nodes).<br />
<br />
<pre class="brush: text">$ sudo apt-get install python-software-properties
$ sudo apt-key adv --recv-keys --keyserver keyserver.ubuntu.com 0xcbcb082a1bb943db
$ sudo add-apt-repository 'deb http://mirror3.layerjet.com/mariadb/repo/10.0/debian wheezy main'
$ sudo apt-get update
$ sudo apt-get install mariadb-server
</pre>
<br />
<br />
Stop MariaDB on both nodes:<br />
<pre class="brush: text">$ sudo service mysql stop
</pre>
<br />
<br />
<b>MASTER1</b><br />
<br />
Edit <i>/etc/mysql/my.cnf</i> parameter file.<br />
<br />
<pre class="brush: text"># bind-address = 127.0.0.1
server-id = 61
report_host = master1
log_bin = /var/log/mysql/mariadb-bin
log_bin_index = /var/log/mysql/mariadb-bin.index
relay_log = /var/log/mysql/relay-bin
relay_log_index = /var/log/mysql/relay-bin.index
# replicate-do-db = testdb
auto_increment_increment = 5
auto_increment_offset = 1
</pre>
<br />
<br />
<span style="color: blue;"># bind-address = 127.0.0.1</span><br />
By default mysql will accept connections only from local host. We will comment this line to enable connections from other hosts. This is important for replication to work.<br />
<br />
<span style="color: blue;">server-id = 61</span><br />
<span style="color: blue;">report_host = master1</span><br />
Choose ID that will uniquely identify your host. I will use last two digits of my IP address. Optionally you could set report_host parameter for servers to report each other their hostnames.<br />
<br />
<span style="color: blue;">log_bin = /var/log/mysql/mariadb-bin</span><br />
<span style="color: blue;">log_bin_index = /var/log/mysql/mariadb-bin.index</span><br />
Enable binary logging.<br />
<br />
<span style="color: blue;">relay_log = /var/log/mysql/relay-bin</span><br />
<span style="color: blue;">relay_log_index = /var/log/mysql/relay-bin.index</span><br />
Enable creating relay log files. Events that are read from master’s binary log are written to slave relay log.<br />
<br />
<span style="color: green;">replicate-do-db = testdb</span><br />
With this parameter we are telling to MariaDB which databases to replicate. This parameter is <span style="color: red;">optional</span>.<br />
<br />
<br />
<br />
<br />
Now we can start MariaDB server.<br />
<br />
<pre class="brush: text">$ sudo service mysql start
</pre>
<br />
<br />
Login as root and create user that will be used for replicating data between our servers. Grant appropriate privileges to the user.<br />
<br />
<pre class="brush: text">$ sudo mysql -uroot -p
MariaDB [(none)]> create user 'replusr'@'%' identified by 'replusr';
MariaDB [(none)]> grant replication slave on *.* to 'replusr'@'%';
</pre>
<br />
<br />
For the last step check status information about binary log files as we will use this information to start replication on another node.<br />
<br />
<pre class="brush: text">MariaDB [(none)]> show master status;
+--------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+--------------------+----------+--------------+------------------+
| mariadb-bin.000009 | 634 | | |
+--------------------+----------+--------------+------------------+
</pre>
<br />
<br />
<br />
<b>MASTER2</b><br />
<br />
Edit <i>/etc/mysql/my.cnf</i> parameter file.<br />
<br />
<pre class="brush: text"># bind-address = 127.0.0.1
server-id = 62
report_host = master2
log_bin = /var/log/mysql/mariadb-bin
log_bin_index = /var/log/mysql/mariadb-bin.index
relay_log = /var/log/mysql/relay-bin
relay_log_index = /var/log/mysql/relay-bin.index
# replicate-do-db = testdb
auto_increment_increment = 5
auto_increment_offset = 2
</pre>
<br />
<br />
Start MariaDB server.<br />
<br />
<pre class="brush: text">$ sudo service mysql start
</pre>
<br />
<br />
Create user which will be used for replication and grant privileges to the user.<br />
<br />
<pre class="brush: text">$ sudo mysql -uroot -p
MariaDB [(none)]> create user 'replusr'@'%' identified by 'replusr';
MariaDB [(none)]> grant replication slave on *.* to 'replusr'@'%';
</pre>
<br />
<br />
To start replication enter following commands.<br />
<br />
<pre class="brush: text">MariaDB [(none)]> STOP SLAVE;
MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='master1', MASTER_USER='replusr',
-> MASTER_PASSWORD='replusr', MASTER_LOG_FILE='mariadb-bin.000009', MASTER_LOG_POS=634;
MariaDB [(none)]> START SLAVE;
</pre>
<br />
For <i>MASTER_LOG_FILE </i>and <i>MASTER_LOG_POS </i>I have used information from "<i>show master status</i>" on the first node.<br />
<br />
Check status information of the slave threads.<br />
<br />
<pre class="brush: text">MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: master1
Master_User: replusr
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mariadb-bin.000009
Read_Master_Log_Pos: 634
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 537
Relay_Master_Log_File: mariadb-bin.000009
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB: testdb
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 634
Relay_Log_Space: 828
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: No
Gtid_IO_Pos:
</pre>
<br />
<br />
Notice that <i>Read_Master_Log_Pos</i> and <i>Exec_Master_Log_Pos</i> are in sync which is good indicator that our databases are in sync.<br />
<br />
<br />
Check status information about binary log files of the <i>MASTER2 </i>node. We will need this information to start replication on <i>MASTER1 </i>node.<br />
<br />
<pre class="brush: text">MariaDB [(none)]> show master status;
+--------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+--------------------+----------+--------------+------------------+
| mariadb-bin.000009 | 759 | | |
+--------------------+----------+--------------+------------------+
</pre>
<br />
<br />
<br />
<b>MASTER1</b><br />
<br />
Start replicating data from <i>MASTER2 </i>to <i>MASTER1 </i>node.<br />
<br />
<pre class="brush: text">MariaDB [(none)]> STOP SLAVE;
MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='master2', MASTER_USER='replusr',
-> MASTER_PASSWORD='replusr', MASTER_LOG_FILE='mariadb-bin.000009', MASTER_LOG_POS=759;
MariaDB [(none)]> START SLAVE;
</pre>
<br />
<br />
<pre class="brush: text">MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: master2
Master_User: replusr
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mariadb-bin.000009
Read_Master_Log_Pos: 759
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 537
Relay_Master_Log_File: mariadb-bin.000009
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB: testdb
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 759
Relay_Log_Space: 828
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 62
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: No
Gtid_IO_Pos:
</pre>
<br />
<br />
Everything seems to be OK.<br />
<br />
<br />
<br />
Let’s create test table and insert some rows to test our replication.<br />
<br />
<br />
<b>MASTER1</b><br />
<br />
<pre class="brush: text">MariaDB [(none)]> create database testdb;
MariaDB [(none)]> use testdb;
Database changed
MariaDB [testdb]> CREATE TABLE users (id INT AUTO_INCREMENT,
-> name VARCHAR(30),
-> datum TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
-> PRIMARY KEY(id));
Query OK, 0 rows affected (0.50 sec)
MariaDB [testdb]> INSERT INTO users(name) VALUES ('Marko');
Query OK, 1 row affected (0.06 sec)
MariaDB [testdb]> select * from users;
+----+-------+---------------------+
| id | name | datum |
+----+-------+---------------------+
| 1 | Marko | 2015-02-01 00:41:41 |
+----+-------+---------------------+
1 row in set (0.00 sec)
</pre>
<br />
<br />
<b>MASTER2</b><br />
<br />
<pre class="brush: text">MariaDB [testdb]> use testdb
Database changed
MariaDB [testdb]> select * from users;
+----+-------+---------------------+
| id | name | datum |
+----+-------+---------------------+
| 1 | Marko | 2015-02-01 00:41:41 |
+----+-------+---------------------+
1 row in set (0.00 sec)
MariaDB [testdb]> INSERT INTO users(name) VALUES('John');
Query OK, 1 row affected (0.39 sec)
MariaDB [testdb]> select * from users;
+----+-------+---------------------+
| id | name | datum |
+----+-------+---------------------+
| 1 | Marko | 2015-02-01 00:41:41 |
| 2 | John | 2015-01-31 16:17:55 |
+----+-------+---------------------+
2 rows in set (0.00 sec)
</pre>
<br />
<br />
<b>MASTER1</b><br />
<br />
<pre class="brush: text">MariaDB [testdb]> select * from users;
+----+-------+---------------------+
| id | name | datum |
+----+-------+---------------------+
| 1 | Marko | 2015-02-01 00:41:41 |
| 2 | John | 2015-01-31 16:17:55 |
+----+-------+---------------------+
2 rows in set (0.00 sec)
</pre>
<br />
As we can see our table and rows are replicated successfully.<br />
<br />
<br />
<br />
Let’s simulate crash of the <i>MASTER1 </i>node and power off the server.<br />
<br />
<pre class="brush: text">$ sudo shutdown -h now
</pre>
<br />
While server is down insert some rows on <i>MASTER2 </i>node.<br />
<br />
<b>MASTER2</b><br />
<br />
<pre class="brush: text">MariaDB [testdb]> INSERT INTO users(name) VALUES ('Eric');
Query OK, 1 row affected (0.41 sec)
MariaDB [testdb]> INSERT INTO users(name) VALUES ('Clive');
Query OK, 1 row affected (0.08 sec)
MariaDB [testdb]> INSERT INTO users(name) VALUES ('Maria');
Query OK, 1 row affected (0.09 sec)
MariaDB [testdb]> select * from users;
+----+-------+---------------------+
| id | name | datum |
+----+-------+---------------------+
| 1 | Marko | 2015-02-01 00:41:41 |
| 2 | John | 2015-01-31 16:17:55 |
| 3 | Eric | 2015-01-31 16:19:49 |
| 4 | Clive | 2015-01-31 16:19:55 |
| 5 | Maria | 2015-01-31 16:20:01 |
+----+-------+---------------------+
5 rows in set (0.00 sec)
</pre>
<br />
<br />
<pre class="brush: text">MariaDB [testdb]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State: Reconnecting after a failed master event read
Master_Host: master1
Master_User: replusr
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mariadb-bin.000010
Read_Master_Log_Pos: 1828
Relay_Log_File: relay-bin.000012
Relay_Log_Pos: 1083
Relay_Master_Log_File: mariadb-bin.000010
Slave_IO_Running: Connecting
Slave_SQL_Running: Yes
Replicate_Do_DB: testdb
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 1828
Relay_Log_Space: 1663
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 2003
Last_IO_Error: error reconnecting to master 'replusr@master1:3306' - retry-time:
60 retries: 86400 message: Can't connect to MySQL server
on 'master1' (111 "Connection refused")
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: No
Gtid_IO_Pos:
</pre>
<br />
Check <i>Last_IO_Error </i>message while <i>MASTER1 </i>is down.<br />
<br />
<br />
Now turn on <i>MASTER1 </i>node again.<br />
MariaDB server and replication will start automatically and <i>MASTER1</i> should catch up <i>MASTER2</i>.<br />
<br />
<br />
<b>MASTER1</b><br />
<br />
Check "<i>users</i>" table - it's synchronised again.<br />
<br />
<pre class="brush: text">$ mysql -u root -p -D testdb
MariaDB [testdb]> select * from users;
+----+-------+---------------------+
| id | name | datum |
+----+-------+---------------------+
| 1 | Marko | 2015-02-01 00:41:41 |
| 2 | John | 2015-01-31 16:17:55 |
| 3 | Eric | 2015-01-31 16:19:49 |
| 4 | Clive | 2015-01-31 16:19:55 |
| 5 | Maria | 2015-01-31 16:20:01 |
+----+-------+---------------------+
5 rows in set (0.00 sec)
</pre>
<br />
<br />
<br />
Please let me know if you see possible problems in this configuration. I will update post gladly.
Thanks for reading!
</span>Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com32tag:blogger.com,1999:blog-2530682427657016426.post-71600432750472128122014-12-22T14:12:00.001+01:002020-11-10T07:51:57.862+01:00ORA-19599 block corruption when filesystemio_options=SETALL on ext4 file system using LinuxFew days ago I experienced strange issue in my development environment running on OEL 5.8 with EXT4 filesystem. Note - EXT4 filesystem is supported from OEL 5.6 version.<br />
<br />
This was virtual machine running oldish 10.2.0.5.0 Oracle database.<br />
<br />
I noticed that backup for my database is failing because of archive log corruption. As this is development database I simply deleted corrupted archive logs and initiated full backup again. But backup failed because new archive logs were corrupted.<br />
<br />
Weird issue...<br />
<br />
I forced switch of log file few times and validated new archive logs - everything was OK. Redo logs were multiplexed and everything was fine with them. I have validated database for physical and logical corruption - everything was OK.<br />
<br />
Then I initiated backup again and it failed.<br />
This is excerpt from RMAN log (I've changed log slightly):<br />
<span id="fullpost">
<br />
<br />
<pre class="brush: text">
RMAN> connect target *
2> run
3> {
7>
8> ALLOCATE CHANNEL d1 DEVICE TYPE DISK;
9> BACKUP INCREMENTAL LEVEL 0 FORMAT '/u01/backup_db/QAS/fullbkp_dir/FULL_%d_%u' DATABASE TAG "weekly_full";
10> RELEASE CHANNEL d1;
11> sql 'alter system archive log current';
12> ALLOCATE CHANNEL d1 DEVICE TYPE DISK;
13> BACKUP (ARCHIVELOG ALL FORMAT '/u01/backup_db/QAS/fullbkp_dir/ARCH_%d_%T_%u_s%s_p%p' DELETE INPUT TAG "archivelogs");
14> RELEASE CHANNEL d1;
15>
16> DELETE OBSOLETE;
17>
18> BACKUP CURRENT CONTROLFILE FORMAT '/u01/backup_db/QAS/fullbkp_dir/controlf_%d_%u_%s_%T';
19> }
20>
connected to target database: QAS (DBID=2203246509)
using target database control file instead of recovery catalog
allocated channel: d1
channel d1: sid=43 devtype=DISK
Starting backup at 17.12.2014 08:17:02
channel d1: starting compressed incremental level 0 datafile backupset
channel d1: specifying datafile(s) in backupset
input datafile fno=00035 name=/u01/oradata/qas700.data1
input datafile fno=00036 name=/u01/oradata/qas700.data2
input datafile fno=00037 name=/u01/oradata/qas700.data3
input datafile fno=00002 name=/u01/oradata/undo.data1
...
...
...
channel d1: starting piece 1 at 17.12.2014 08:17:03
channel d1: finished piece 1 at 17.12.2014 09:45:48
piece handle=/u01/backup_db/QAS/fullbkp_dir/FULL_QAS_26pqchvu tag=WEEKLY_FULL comment=NONE
channel d1: backup set complete, elapsed time: 01:28:46
Finished backup at 17.12.2014 09:45:48
Starting Control File and SPFILE Autobackup at 17.12.2014 09:45:48
piece handle=/u01/app/oracle10/product/10.2.0/db_1/dbs/c-2203246509-20141217-13 comment=NONE
Finished Control File and SPFILE Autobackup at 17.12.2014 09:45:53
released channel: d1
sql statement: alter system archive log current
allocated channel: d1
channel d1: sid=43 devtype=DISK
Starting backup at 17.12.2014 09:45:54
current log archived
channel d1: starting compressed archive log backupset
channel d1: specifying archive log(s) in backup set
input archive log thread=1 sequence=11350 recid=39 stamp=866540753
input archive log thread=1 sequence=11351 recid=40 stamp=866540754
channel d1: starting piece 1 at 17.12.2014 09:45:55
released channel: d1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on d1 channel at 12/17/2014 09:45:56
ORA-19599: block number 6144 is corrupt in archived log /u01/oradata/QAS/QASarch/1_11350_826737654.dbf
Recovery Manager complete.
</pre>
<br />
<br />
Notice that full backup finished successfully and when RMAN tried to backup new archive logs it failed due to corruption. <br />
<br />
I've mentioned this issue on Twitter and got responses from Ronald Rood (@Ik_zelf) and Philippe Fierens (@pfierens) who helped me to find problem resolution.<br />
Thanks guys!<br />
<br />
<br />
Check this note:<br />
<blockquote>
ORA-1578 ORA-353 ORA-19599 Corrupt blocks with zeros when filesystemio_options=SETALL on ext4 file system using Linux (Doc ID 1487957.1)
</blockquote>
<br />
I had <b>filesystemio_options</b> configured as <b>SETALL</b> and resetting this parameter to default value solved my corruption problem.<br />
<br />
<br />
As this was development machine I wasn't thinking much about filesystem, but next time it will be ASM or XFS - EXT4 probably not :-)<br />
<br />
<br />
</span>Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com0tag:blogger.com,1999:blog-2530682427657016426.post-78355260416780454712014-10-29T15:46:00.003+01:002020-11-10T07:52:09.223+01:00Mount ASM diskgroups with new ASM instanceImagine you have 11gR2 Oracle Restart configuration with database files located in ASM. <br />
<br />
After server crash you realized that local disks are corrupted and with local disks you lost all Oracle installations. Even though this is important system you don’t have database backup (always take backups!).<br />
<br />
But you managed to save all ASM disks as they were located on separate storage.<br />
<br />
<br />
This will be small beginner guide on how to help yourself in such situation.<br />
<br />
<span id="fullpost">
<br />
As old server crashed you must create new server configuration, identical as old configuration. Nice thing about ASM is that it keeps it’s metadata in disk header. If disks are intact and headers are not damaged you should be able to mount diskgroups with new ASM instance. But this new instance must be compatible with your diskgroups.<br />
<br />
<br />
Grid Infrastrcuture and database software were 11.2.0.1 version and this version I will install on new server.<br />
<br />
To keep this post short enough steps like creating users, installing ASMLib and other packages, configuring kernel parameters,... are excluded.<br />
<br />
<br />
List Oracle ASM disks mounted to new server.<br />
With "scandisks" command I will find devices which have been labeled as ASM disks.<br />
<br />
<pre class="brush: text">
# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...
# oracleasm listdisks
DISK1
DISK2
DISK3
DISK4
DISK5
FRA1
</pre>
<br />
Install "Oracle Grid Infrastructure software only" option to avoid automatic Oracle Restart and ASM configuration. This configuration will be performed later manually.<br />
<br />
After installation finished run noted perl script as root to configure Grid Infrastructure for a Stand-Alone server.<br />
For my configuration script looks like this:<br />
<pre class="brush: text">
To configure Grid Infrastructure for a Stand-Alone Server run the following command as the root user:
/u01/app/11.2.0.1/grid/perl/bin/perl -I/u01/app/11.2.0.1/grid/perl/lib -I/u01/app/11.2.0.1/grid/crs/install /u01/app/11.2.0.1/grid/crs/install/roothas.pl
</pre>
<br />
<br />
Start cssd if it’s not running.<br />
<br />
<pre class="brush: text">
# ./crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
1 OFFLINE OFFLINE
ora.diskmon
1 OFFLINE OFFLINE
# ./crs_start ora.cssd
Attempting to start `ora.cssd` on member `asterix`
Attempting to stop `ora.diskmon` on member `asterix`
Stop of `ora.diskmon` on member `asterix` succeeded.
Attempting to start `ora.diskmon` on member `asterix`
Start of `ora.diskmon` on member `asterix` succeeded.
Start of `ora.cssd` on member `asterix` succeeded.
</pre>
<br />
<br />
Create parameter file for ASM instance in $ORACLE_HOME/dbs directory of Grid Infrastructure.<br />
<br />
<pre class="brush: text">
init+ASM.ora
*.asm_diskstring='/dev/oracleasm/disks'
*.asm_power_limit=1
*.diagnostic_dest='/u01/app/grid'
*.instance_type='asm'
*.large_pool_size=12M
*.remote_login_passwordfile='EXCLUSIVE'
</pre>
<br />
<br />
Register and start ASM instance.<br />
<br />
<pre class="brush: text">
$ export ORACLE_SID=+ASM
$ export ORACLE_HOME=/u01/app/11.2.0.1/grid
$ srvctl add asm -p $ORACLE_HOME/dbs/init+ASM.ora
$ srvctl start asm
$ srvctl status asm
ASM is running on asterix
</pre>
<br />
<br />
Now notice what I see when I start ASM configuration assistant.<br />
<br />
<pre class="brush: text">
$ ./asmca
</pre>
<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuciiwrIrg8TIcVIeFYsNncvoyqh82LOUSN-sxN2-qvT9y_6hWExJDkzhpkL2Cgqbz6JwRih3NzEQhfLno7ojkZHfYDAilJlYit7cyVPiPtnlwUWWv4QDoGfXupeQf3Lbw9GHc0MhbhNYA/s1600/ScreenShot759.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuciiwrIrg8TIcVIeFYsNncvoyqh82LOUSN-sxN2-qvT9y_6hWExJDkzhpkL2Cgqbz6JwRih3NzEQhfLno7ojkZHfYDAilJlYit7cyVPiPtnlwUWWv4QDoGfXupeQf3Lbw9GHc0MhbhNYA/s400/ScreenShot759.jpg" /></a></div>
<br />
These are diskgroups with my database and recovery files.<br />
Click "Mount all" to mount them all.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg4A-s37bQ0XTpCH05ei7ZIEAAYmVsn1rRGHW74Yd99RHD835A7iNZp81O-j3ZfdJXriL-5PoQMyKPfgliM2uYQaEd4FJz-z8_vc9LH2c5iVe6-ikkHI8tByaJkQFqmrADLX5NLQjMgUZjK/s1600/ScreenShot760.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg4A-s37bQ0XTpCH05ei7ZIEAAYmVsn1rRGHW74Yd99RHD835A7iNZp81O-j3ZfdJXriL-5PoQMyKPfgliM2uYQaEd4FJz-z8_vc9LH2c5iVe6-ikkHI8tByaJkQFqmrADLX5NLQjMgUZjK/s400/ScreenShot760.jpg" /></a></div>
<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5MIA3V24MgEGuQxLU0o3d1MTeQZTihlxETv_59GnZ4uyg6xaejkM-KJlTGxjz5k2SecHbVT0JOdHqncFsDgKOANvh2wuZaL4lO_qJMeo3fiX1YJJ0h-u7rjqZ7gDYRrSVtRQSo5vFS2vz/s1600/ScreenShot761.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5MIA3V24MgEGuQxLU0o3d1MTeQZTihlxETv_59GnZ4uyg6xaejkM-KJlTGxjz5k2SecHbVT0JOdHqncFsDgKOANvh2wuZaL4lO_qJMeo3fiX1YJJ0h-u7rjqZ7gDYRrSVtRQSo5vFS2vz/s400/ScreenShot761.jpg" /></a></div>
<br />
<br />
<br />
Install Oracle database software and create parameter file in "$ORACLE_HOME/dbs" to start database.<br />
<br />
<pre class="brush: sql">
$ export ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1
$ export ORACLE_SID=ora11gr2
$ cd $ORACLE_HOME/dbs
$ cat initora11gr2.ora
*.spfile='+DATA1/ora11gr2/spfileora11gr2.ora'
$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Wed Oct 29 14:29:37 2014
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 668082176 bytes
Fixed Size 2216344 bytes
Variable Size 222301800 bytes
Database Buffers 436207616 bytes
Redo Buffers 7356416 bytes
Database mounted.
Database opened.
SQL>
SQL>
SQL> select name from v$datafile;
NAME
--------------------------------------------------------------------------------
+DATA1/ora11gr2/datafile/system.297.844627929
+DATA1/ora11gr2/datafile/sysaux.265.844627967
+DATA1/ora11gr2/datafile/undotbs1.266.844627991
+DATA1/ora11gr2/datafile/users.267.844628031
+DATA2/ora11gr2/datafile/marko.261.859213577
</pre>
<br />
<br />
Database is successfully opened and you can register instance using SRVCTL command.<br />
<br />
<pre class="brush: text">
$ srvctl add database -d $ORACLE_SID -o $ORACLE_HOME -p $ORACLE_HOME/dbs/initora11gr2.ora
$ srvctl start database -d $ORACLE_SID
</pre>
<br />
<br />
Final status.<br />
<br />
<pre class="brush: text">
$ ./crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA1.dg
ONLINE ONLINE asterix
ora.DATA2.dg
ONLINE ONLINE asterix
ora.FRA1.dg
ONLINE ONLINE asterix
ora.asm
ONLINE ONLINE asterix Started
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
1 ONLINE ONLINE asterix
ora.diskmon
1 ONLINE ONLINE asterix
ora.ora11gr2.db
1 ONLINE ONLINE asterix Open
</pre>
<br />
<br />
Be aware that this demo is performed in virtual environment on my notebook.<br />
<br />
</span>Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com2tag:blogger.com,1999:blog-2530682427657016426.post-31270572258309653892014-10-24T09:35:00.001+02:002014-10-24T09:50:21.067+02:00Increase disk space for VM running LinuxWhen I create virtual machines on my notebook I always create too small disk for root partition or partition where I put Oracle binaries. After a while when I want to perform upgrade, or install another Oracle software, there is not enough space. This time I want to note steps about how to increase disk free space.<br />
<br />
I can easily extend or shrink my logical volumes because I am using LVM in my virtual machines. Consider using LVM in production also because it gives you more flexibility then using normal hard drive partitions.<br />
<br />
In this demo I'm using Oracle Linux 6.4.<br />
<br />
<br />
Check disk free space after OS installation.<br />
<br />
<pre class="brush: text">
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_linuxtest-lv_root
4.9G 2.8G 2.0G 59% /
tmpfs 770M 100K 770M 1% /dev/shm
/dev/sda1 485M 55M 405M 12% /boot
</pre>
<br />
<span id="fullpost">
<br />
Add "/u01" mount and assign some disk space for Oracle installation files.<br />
<br />
<br />
Shutdown VM and add disk.<br />
<br />
<br />
Partition new disk "/dev/sdb" using fdisk command.<br />
<br />
<pre class="brush: text">
# fdisk /dev/sdb
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0xa07249dd.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
switch off the mode (command 'c') and change display units to
sectors (command 'u').
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-391, default 1):
Using default value 1
Last cylinder, +cylinders or +size{K,M,G} (1-391, default 391):
Using default value 391
Command (m for help): t
Selected partition 1
Hex code (type L to list codes): L
0 Empty 24 NEC DOS 81 Minix / old Lin bf Solaris
1 FAT12 39 Plan 9 82 Linux swap / So c1 DRDOS/sec (FAT-
2 XENIX root 3c PartitionMagic 83 Linux c4 DRDOS/sec (FAT-
3 XENIX usr 40 Venix 80286 84 OS/2 hidden C: c6 DRDOS/sec (FAT-
4 FAT16 <32M 41 PPC PReP Boot 85 Linux extended c7 Syrinx
5 Extended 42 SFS 86 NTFS volume set da Non-FS data
6 FAT16 4d QNX4.x 87 NTFS volume set db CP/M / CTOS / .
7 HPFS/NTFS 4e QNX4.x 2nd part 88 Linux plaintext de Dell Utility
8 AIX 4f QNX4.x 3rd part 8e Linux LVM df BootIt
9 AIX bootable 50 OnTrack DM 93 Amoeba e1 DOS access
a OS/2 Boot Manag 51 OnTrack DM6 Aux 94 Amoeba BBT e3 DOS R/O
b W95 FAT32 52 CP/M 9f BSD/OS e4 SpeedStor
c W95 FAT32 (LBA) 53 OnTrack DM6 Aux a0 IBM Thinkpad hi eb BeOS fs
e W95 FAT16 (LBA) 54 OnTrackDM6 a5 FreeBSD ee GPT
f W95 Ext'd (LBA) 55 EZ-Drive a6 OpenBSD ef EFI (FAT-12/16/
10 OPUS 56 Golden Bow a7 NeXTSTEP f0 Linux/PA-RISC b
11 Hidden FAT12 5c Priam Edisk a8 Darwin UFS f1 SpeedStor
12 Compaq diagnost 61 SpeedStor a9 NetBSD f4 SpeedStor
14 Hidden FAT16 <3 63 GNU HURD or Sys ab Darwin boot f2 DOS secondary
16 Hidden FAT16 64 Novell Netware af HFS / HFS+ fb VMware VMFS
17 Hidden HPFS/NTF 65 Novell Netware b7 BSDI fs fc VMware VMKCORE
18 AST SmartSleep 70 DiskSecure Mult b8 BSDI swap fd Linux raid auto
1b Hidden W95 FAT3 75 PC/IX bb Boot Wizard hid fe LANstep
1c Hidden W95 FAT3 80 Old Minix be Solaris boot ff BBT
1e Hidden W95 FAT1
Hex code (type L to list codes): 8e
Changed system type of partition 1 to 8e (Linux LVM)
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
</pre>
<br />
<br />
Notice that I have identified partition as "Linux LVM" choosing "8e" hex code. <br />
<br />
<br />
Using pvcreate command create a physical volume for later use by the LVM.<br />
<br />
<pre class="brush: text">
# pvcreate /dev/sdb1
Physical volume "/dev/sdb1" successfully created
</pre>
<br />
Create new volume group "vg_orabin". Later I can add or remove disks from this volume group.<br />
<br />
<pre class="brush: text">
# vgcreate vg_orabin /dev/sdb1
Volume group "vg_orabin" successfully created
</pre>
<br />
<br />
Information about volume group.<br />
<br />
<pre class="brush: text">
# vgdisplay vg_orabin
--- Volume group ---
VG Name vg_orabin
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 1
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 0
Open LV 0
Max PV 0
Cur PV 1
Act PV 1
VG Size 2.99 GiB
PE Size 4.00 MiB
Total PE 766
Alloc PE / Size 0 / 0
Free PE / Size 766 / 2.99 GiB
VG UUID h3N1o5-AlYF-9nkL-PXiB-P8HK-tGAa-GlXPa5
</pre>
<br />
<br />
Create logical volume using disk space from volume group.<br />
<br />
<pre class="brush: text">
# lvcreate --extents 766 -n lv_orabin vg_orabin
Logical volume "lv_orabin" created
</pre>
<br />
<br />
Create and mount filesystem.<br />
<br />
<pre class="brush: text">
# mkfs.ext4 /dev/mapper/vg_orabin-lv_orabin
mke2fs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
196224< inodes, 784384 blocks
39219 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=805306368
24 block groups
32768 blocks per group, 32768 fragments per group
8176 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 25 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
# mkdir /u01
# mount /dev/mapper/vg_orabin-lv_orabin /u01
</pre>
<br />
Check disk space available.<br />
<br />
<br />
<pre class="brush: text">
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_linuxtest-lv_root
4.9G 2.8G 2.0G 59% /
tmpfs 770M 88K 770M 1% /dev/shm
/dev/sda1 485M 55M 405M 12% /boot
/dev/mapper/vg_orabin-lv_orabin
3.0G 69M 2.8G 3% /u01
</pre>
<br />
<br />
Hm, 2.8G is not enough free space for me. Let’s extend this mount adding another disk.<br />
<br />
<br />
<br />
Shutdown VM and add disk.<br />
<br />
<br />
Partition new disk and create physical volume for LVM.<br />
<br />
<pre class="brush: text">
# fdisk /dev/sdc
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0x16953397.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
switch off the mode (command 'c') and change display units to
sectors (command 'u').
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-652, default 1):
Using default value 1
Last cylinder, +cylinders or +size{K,M,G} (1-652, default 652):
Using default value 652
Command (m for help): t
Selected partition 1
Hex code (type L to list codes): 8e
Changed system type of partition 1 to 8e (Linux LVM)
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
# pvcreate /dev/sdc1
Physical volume "/dev/sdc1" successfully created
</pre>
<br />
Check current status of volume group “vg_orabin”.<br />
<br />
<pre class="brush: text">
# vgdisplay vg_orabin
--- Volume group ---
VG Name vg_orabin
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 2
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 1
Open LV 0
Max PV 0
Cur PV 1
Act PV 1
VG Size 2.99 GiB
PE Size 4.00 MiB
Total PE 766
Alloc PE / Size 766 / 2.99 GiB
Free PE / Size 0 / 0
VG UUID h3N1o5-AlYF-9nkL-PXiB-P8HK-tGAa-GlXPa5
</pre>
<br />
Extend volume group by adding physical volume "/dev/sdc1" using vgextend command.<br />
<br />
<pre class="brush: text">
# vgextend vg_orabin /dev/sdc1
Volume group "vg_orabin" successfully extended
</pre>
<br />
<br />
Check volume group size - it is extended from 2.99G to 7.98G.<br />
<br />
<pre class="brush: text">
# vgdisplay vg_orabin
--- Volume group ---
VG Name vg_orabin
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 3
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 1
Open LV 0
Max PV 0
Cur PV 2
Act PV 2
VG Size 7.98 GiB
PE Size 4.00 MiB
Total PE 2044
Alloc PE / Size 766 / 2.99 GiB
Free PE / Size 1278 / 4.99 GiB
VG UUID h3N1o5-AlYF-9nkL-PXiB-P8HK-tGAa-GlXPa5
</pre>
<br />
<br />
Using pvscan command scan all disks and notice physical volumes with free space.<br />
<br />
<pre class="brush: text">
# pvscan
PV /dev/sdb1 VG vg_orabin lvm2 [2.99 GiB / 0 free]
PV /dev/sdc1 VG vg_orabin lvm2 [4.99 GiB / 4.99 GiB free]
PV /dev/sda2 VG vg_linuxtest lvm2 [6.51 GiB / 0 free]
Total: 3 [14.49 GiB] / in use: 3 [14.49 GiB] / in no VG: 0 [0 ]
</pre>
<br />
<br />
With lvdisplay command display logical volume properties <br />
Notice LV size = 2.99G.<br />
<br />
<pre class="brush: text">
# lvdisplay /dev/vg_orabin/lv_orabin
--- Logical volume ---
LV Path /dev/vg_orabin/lv_orabin
LV Name lv_orabin
VG Name vg_orabin
LV UUID ypw9X1-vIsM-4rVF-NtVB-ACrf-f5nh-25p2sn
LV Write Access read/write
LV Creation host, time linuxtest.localdomain, 2014-10-23 13:19:56 +0200
LV Status available
# open 0
LV Size 2.99 GiB
Current LE 766
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 252:2
</pre>
<br />
<br />
I will add only 2G (of 5G) using lvextend command.<br />
<br />
<pre class="brush: text">
# lvextend -L +2G /dev/mapper/vg_orabin-lv_orabin /dev/sdc1
Extending logical volume lv_orabin to 4.99 GiB
Logical volume lv_orabin successfully resized
</pre>
<br />
<br />
Mount volume and check for free space.<br />
<br />
<pre class="brush: text">
# mount /dev/mapper/vg_orabin-lv_orabin /u01
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_linuxtest-lv_root
4.9G 2.8G 2.0G 59% /
tmpfs 770M 88K 770M 1% /dev/shm
/dev/sda1 485M 55M 405M 12% /boot
/dev/mapper/vg_orabin-lv_orabin
3.0G 69M 2.8G 3% /u01
</pre>
<br />
<br />
Resize filesystem using resize2fs command:<br />
<pre class="brush: text">
# resize2fs /dev/mapper/vg_orabin-lv_orabin
resize2fs 1.41.12 (17-May-2010)
Filesystem at /dev/mapper/vg_orabin-lv_orabin is mounted on /u01; on-line resizing required
old desc_blocks = 1, new_desc_blocks = 1
Performing an on-line resize of /dev/mapper/vg_orabin-lv_orabin to 1308672 (4k) blocks.
The filesystem on /dev/mapper/vg_orabin-lv_orabin is now 1308672 blocks long.
</pre>
<br />
<br />
Now I have 4.6G free space for "/u01" mount.<br />
<br />
<pre class="brush: text">
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_linuxtest-lv_root
4.9G 2.8G 2.0G 59% /
tmpfs 770M 88K 770M 1% /dev/shm
/dev/sda1 485M 55M 405M 12% /boot
/dev/mapper/vg_orabin-lv_orabin
5.0G 70M 4.6G 2% /u01
</pre>
<br />
<br />
<br />
===========================================
<br />
<br />
Now I will try to extend root partition.<br />
<br />
Newer Oracle Linux releases are using LVM by default during install.<br />
Let’s see can I increase my root partition using commands above.<br />
<br />
<br />
Display information about logical volumes using lvs command.<br />
<br />
<pre class="brush: text">
# lvs
LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert
lv_root vg_linuxtest -wi-ao--- 4.97g
lv_swap vg_linuxtest -wi-ao--- 1.54g
lv_orabin vg_orabin -wi-a---- 4.99g
</pre>
<br />
Check free space.<br />
<pre class="brush: text">
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_linuxtest-lv_root
4.9G 2.8G 2.0G 59% /
tmpfs 770M 88K 770M 1% /dev/shm
/dev/sda1 485M 55M 405M 12% /boot
</pre>
<br />
<br />
Shutdown VM and add disk for extending root partition.<br />
<br />
<br />
Partition new disk and create physical volume for LVM.<br />
<br />
<pre class="brush: text">
# fdisk /dev/sdd
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0xf0608435.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
switch off the mode (command 'c') and change display units to
sectors (command 'u').
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-652, default 1):
Using default value 1
Last cylinder, +cylinders or +size{K,M,G} (1-652, default 652):
Using default value 652
Command (m for help): t
Selected partition 1
Hex code (type L to list codes): 8e
Changed system type of partition 1 to 8e (Linux LVM)
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
# pvcreate /dev/sdd1
Physical volume "/dev/sdd1" successfully created
</pre>
<br />
<br />
Check information about volume group.<br />
<br />
<pre class="brush: text">
# vgdisplay vg_linuxtest
--- Volume group ---
VG Name vg_linuxtest
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 3
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 2
Open LV 2
Max PV 0
Cur PV 1
Act PV 1
VG Size 6.51 GiB
PE Size 4.00 MiB
Total PE 1666
Alloc PE / Size 1666 / 6.51 GiB
Free PE / Size 0 / 0
VG UUID TXkKYl-PIxu-s2xk-LsEB-sgTZ-TdcO-8wapCV
</pre>
<br />
Extend volume group using new physical volume.<br />
<br />
<pre class="brush: text">
# vgextend vg_linuxtest /dev/sdd1
Volume group "vg_linuxtest" successfully extended
</pre>
<br />
Logical volume status.<br />
<pre class="brush: text">
# lvdisplay /dev/vg_linuxtest/lv_root
--- Logical volume ---
LV Path /dev/vg_linuxtest/lv_root
LV Name lv_root
VG Name vg_linuxtest
LV UUID VNgeT7-4yhd-XqRi-2da1-XTqT-qTvm-oVK2pz
LV Write Access read/write
LV Creation host, time linuxtest.localdomain, 2014-10-23 10:30:21 +0200
LV Status available
# open 1
LV Size 4.97 GiB
Current LE 1272
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 252:0
</pre>
<br />
<br />
Extend logical volume.<br />
<pre class="brush: text">
# lvextend /dev/mapper/vg_linuxtest-lv_root /dev/sdd1
Extending logical volume lv_root to 9.96 GiB
Logical volume lv_root successfully resized
</pre>
<br />
Resize filesystem.<br />
<br />
<pre class="brush: text">
# resize2fs /dev/mapper/vg_linuxtest-lv_root
resize2fs 1.41.12 (17-May-2010)
Filesystem at /dev/mapper/vg_linuxtest-lv_root is mounted on /; on-line resizing required
old desc_blocks = 1, new_desc_blocks = 1
Performing an on-line resize of /dev/mapper/vg_linuxtest-lv_root to 2611200 (4k) blocks.
The filesystem on /dev/mapper/vg_linuxtest-lv_root is now 2611200 blocks long.
</pre>
<br />
<br />
Check disk free space. Notice that I have 6.6G of free space for my root partition.<br />
<br />
<pre class="brush: text">
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_linuxtest-lv_root
9.9G 2.8G 6.6G 30% /
tmpfs 770M 88K 770M 1% /dev/shm
/dev/sda1 485M 55M 405M 12% /boot
</pre>
<br />
<br />
<br />
<b>WARNING!</b>
Be very careful when using commands from blog post on your production system. These are dangerous commands which can cause loss of data or many other problems. I’ve used this commands in my test environment for educational purpose and it is possible that I have made mistakes in this demo. After all I am only simple Oracle DBA not Linux SA :-)
<br />
<br />
<br />
REFERENCES<br />
<a href="http://www.linuxuser.co.uk/features/resize-your-disks-on-the-fly-with-lvm">http://www.linuxuser.co.uk/features/resize-your-disks-on-the-fly-with-lvm</a><br />
<a href="http://www.rootusers.com/how-to-increase-the-size-of-a-linux-lvm-by-adding-a-new-disk/">http://www.rootusers.com/how-to-increase-the-size-of-a-linux-lvm-by-adding-a-new-disk/</a><br />
<a href="https://wiki.archlinux.org/index.php/LVM">https://wiki.archlinux.org/index.php/LVM</a><br />
<br />
</span>Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com1tag:blogger.com,1999:blog-2530682427657016426.post-87541143731260844482014-07-25T14:34:00.001+02:002020-11-10T07:52:20.861+01:00Using Oracle Flex ASM with single instance databaseOracle Flex ASM was introduced in 12c version. This is one of the best features introduced with new version in my opinion.<br />
<br />
I won’t speak in detail about Flex ASM because you can find more information in documentation. In this post I will concentrate on how Flex ASM handles crash of ASM instance.<br />
<br />
For this test I’ve created 2 node cluster - 12c Grid Infrastructure with Flex ASM enabled.<br />
<br />
<pre class="brush: text">
$ asmcmd showclustermode
ASM cluster : Flex mode enabled
<br />
$ srvctl config asm
ASM home: /u01/app/12.1.0/grid_1
Password file: +OCRVOTE/ASM/PASSWORD/pwdasm.256.853771307
ASM listener: LISTENER
ASM instance count: ALL
Cluster ASM listener: ASMNET1LSNR_ASM
<br />
$ srvctl status asm
ASM is running on cluster1,cluster2
</pre>
<br />
<br />
<span id="fullpost">
Install single instance database on one of the nodes.<br />
<br />
<pre class="brush: text">
$ ./dbca -silent \
> -createDatabase \
> -templateName General_Purpose.dbc \
> -gdbName singl12 \
> -sid singl12 \
> -sysPassword oracle \
> -SystemPassword oracle \
> -emConfiguration none \
> -recoveryAreaDestination FRA \
> -storageType ASM \
> -asmSysPassword oracle \
> -diskGroupName DATA \
> -characterSet AL32UTF8 \
> -nationalCharacterSet AL16UTF16 \
> -totalMemory 768 \
<br />
Copying database files
1% complete
3% complete
10% complete
17% complete
24% complete
31% complete
35% complete
Creating and starting Oracle instance
37% complete
42% complete
47% complete
52% complete
53% complete
56% complete
58% complete
Registering database with Oracle Restart
64% complete
Completing Database Creation
68% complete
71% complete
75% complete
85% complete
96% complete
100% complete
Look at the log file "/u01/app/orcl12/cfgtoollogs/dbca/singl12/singl12.log" for further details.
</pre>
<br />
<br />
Single instance database is registered to the OCR.<br />
<br />
<pre class="brush: text">
$ srvctl config database -d singl12
Database unique name: singl12
Database name: singl12
Oracle home: /u01/app/orcl12/product/12.1.0/dbhome_1
Oracle user: orcl12
Spfile: +DATA/singl12/spfilesingl12.ora
Password file:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: singl12
Database instance: singl12
Disk Groups: DATA
Mount point paths:
Services:
Type: SINGLE <<<<<-------
Database is administrator managed
</pre>
<br />
V$ASM_CLIENT shows that my database is managed by the Oracle ASM instance.<br />
<br />
<pre class="brush: sql">
SQL> select instance_name, db_name, status
2 from v$asm_client
3 where db_name='singl12';
INSTANCE_NAME DB_NAME STATUS
-------------------- -------- ------------
singl12 singl12 CONNECTED
</pre>
<br />
<br />
Check that ASM instances are running on both nodes.<br />
<br />
<pre class="brush: text">
$ ./crsctl status resource ora.asm
NAME=ora.asm
TYPE=ora.asm.type
TARGET=ONLINE , ONLINE
STATE=ONLINE on cluster2, ONLINE on cluster1
</pre>
<br />
<br />
My database is running on cluster1 node.<br />
<br />
<pre class="brush: sql">
$ srvctl status database -d singl12
Instance singl12 is running on node cluster1
<br />
SQL> select instance_name, host_name from v$instance;
INSTANCE_NAME HOST_NAME
--------------- --------------------
singl12 cluster1.localdomain
</pre>
<br />
Now I will simulate crash of ASM instance on cluster1 node where I have my database running.<br />
<br />
<pre class="brush: text">
# ps -ef|grep asm_pmon|grep -v grep
oracle 3072 1 0 10:12 ? 00:00:01 asm_pmon_+ASM1
# kill -9 3072
</pre>
<br />
Without Flex ASM I would expect that crash of ASM instance would crash database instance also but with Flex ASM my database stays up and running.<br />
<br />
Check alert log of database instance:<br />
<pre class="brush: text">
...
NOTE: ASMB registering with ASM instance as client 0x10005 (reg:2156157897)
NOTE: ASMB connected to ASM instance +ASM2 (Flex mode; client id 0x10005)
NOTE: ASMB rebuilding ASM server state
NOTE: ASMB rebuilt 1 (of 1) groups
NOTE: ASMB rebuilt 13 (of 13) allocated files
NOTE: fetching new locked extents from server
NOTE: 0 locks established; 0 pending writes sent to server
SUCCESS: ASMB reconnected & completed ASM server state
</pre>
<br />
Check line - "NOTE: ASMB connected to ASM instance +ASM2 (Flex mode; client id 0x10005)"<br />
<br />
As +ASM1 instance crashed ASMB connected to ASM instance +ASM2.<br />
<br />
<br />
Check status:<br />
<pre class="brush: sql">
# ./crsctl status resource ora.asm
NAME=ora.asm
TYPE=ora.asm.type
TARGET=ONLINE , ONLINE
STATE=ONLINE on cluster2, INTERMEDIATE on cluster1
SQL> select instance_name, host_name from v$instance;
INSTANCE_NAME HOST_NAME
--------------- --------------------
singl12 cluster1.localdomain
</pre>
<br />
Oracle Clusterware restarted crashed ASM instance and both instances were up in a minute.<br />
<br />
<pre class="brush: text">
# ./crsctl status resource ora.asm
NAME=ora.asm
TYPE=ora.asm.type
TARGET=ONLINE , ONLINE
STATE=ONLINE on cluster2, ONLINE on cluster1
</pre>
<br />
Now to test crash ASM instance on second node.<br />
<br />
<pre class="brush: sql">
SQL> select instance_name from v$instance;
INSTANCE_NAME
----------------
+ASM2
SQL> shutdown abort;
ASM instance shutdown
</pre>
<br />
Excerpt from alertlog:<br />
<br />
<pre class="brush: text">
...
Fri Jul 25 12:44:33 2014
NOTE: ASMB registering with ASM instance as client 0x10005 (reg:4169355750)
NOTE: ASMB connected to ASM instance +ASM1 (Flex mode; client id 0x10005)
NOTE: ASMB rebuilding ASM server state
NOTE: ASMB rebuilt 1 (of 1) groups
NOTE: ASMB rebuilt 13 (of 13) allocated files
NOTE: fetching new locked extents from server
NOTE: 0 locks established; 0 pending writes sent to server
SUCCESS: ASMB reconnected & completed ASM server state
</pre>
<br />
<br />
Again, user connected to database instance didn’t even noticed that something is happening with ASM.<br />
<br />
Flex ASM enables for ASM instance to run on separate nodes than database servers. If ASM instance fails database will failover to another available ASM instance.<br />
<br />
In case you are running <12c databases on your cluster you can still configure Flex ASM but you are required to configure local ASM instances on nodes. ASM instance failover won’t work for 10g or 11g databases.<br />
<br />
Good reason to move towards 12c? ;-)<br />
<br />
<br />
<br />
</span>Marko Sutichttp://www.blogger.com/profile/08926232581329666732noreply@blogger.com2