Tuesday, January 10, 2012

Troubleshooting memory usage

Last few days we had significant load problems with one of our database servers. Server was choking under minor database load which caused applications to be non-responsive.

This is 8-core 32-bit OEL5 (2.6.18-92.el5PAE) server with 4GB RAM running two Oracle 10g (10.2.0.4) SE databases. SGA's of those databases where 1,2 GB and 700MB with pga_aggregate_target 64M and 128M.

As databases were small (under 50G) and not very active I didn't expected any performance problems.


Using vmstat, mpstat, top and sar tools I've noticed that we were having slight issues with swapping, despite that there was enough physical memory.

So I've started troubleshooting...

First I've found that one of top running processes was "kswapd0" process.
This is Kernel Swap Deamon responsible for scanning memory to locate free pages, and scheduling dirty pages to be written to disk. It is OK to occasionally notice kswapd running, but it shouldn't continuously appear in top processes. In some situations you can resolve kswapd issues using huge pages but this wasn't the case here.

I've decided to check several OS parameters related to swapping to find possible cause of our issues.

Parameters below were specified in "/etc/sysctl.conf":
...
vm.overcommit_memory=1
vm.swappiness=0
vm.min_free_kbytes=262144
vm.page-cluster=5
...

Considering that I didn't installed this server, this parameters were little odd to me. All of them were custom parameters not required from install guide.

The swappiness variable bothered me the most. It tells Linux kernel how fast it can move inactive pages to swap area. Default value is 60, but in this case value was changed to 0 which tells the Linux kernel to avoid swapping as much as possible.

As this was production server which was still choking we had to react quickly without prior testing. I haven't seen the purpose of vm custom parameters on this server so I've decided to comment them and apply only default parameters required from install guide. It's easy to add custom parameters later if there is a need.

After I applied parameters running "sysctl -p" load average suddenly started to fall. Several minutes later it fell from 13 to 1. From vmstat I've noticed significant drop in swapping and kswapd0 wasn't in top processes anymore.

With altering swappiness to 60 (default) I've told kernel to find inactive pages more aggressively and swap them out to disk.
Probably that helped OS to use memory more effectively.

I will closely monitor performance of this server in next few days but so far it is working very good.

Considerations for future:
- enable Huge Pages
- migrate os/db to 64bit

When I'm installing database servers I tend to use parameters specified in install guide or Oracle support site. Eventually I will add some custom parameters but only if I have proper reason for that. Even then it is wise to test changes before applying them in production.

So my conclusion for this case would be - do not add custom parameters without proper reason.
Google - "Compulsive Tuning Disorder" :)


5 comments:

  1. 2 DB on same server with 4 GB of RAM and called PRODUCTION?

    Common, today basic laptops has that ...

    :-)

    ReplyDelete
  2. This server is powerful enough to host production database for many small businesses (at least in Croatia) :)

    We're planing migration for this databases and then this server will become my dev/testing machine.
    Add some RAM and you have decent server for various purposes.

    p.s.
    I would be very happy to have laptop strong as this machine :D

    ReplyDelete
  3. When I mentioned laptop I thought on RAM power ... which was cause problem of the topic ...

    And 4G are lower standard for today's laptops.
    :-)

    ReplyDelete
  4. You are correct, 4GB is not much but it is more than enough for our db server in this case.

    Maybe we could solve our issues with adding more RAM, but in this post I tried to point that removing custom vm.* parameters solved our performance problems.

    Now server is pretty underused :)

    ReplyDelete
  5. Damir: you would be surprised, if I tell you how many small production databases we have - many of them using 4 GB RAM :)
    If you have a small, not frequently changed database, then 4 GB RAM is really enough - more then that would be only the wastage of unused ressources.
    It's ok when you have a couple of databases, but if you have to manage 200+ database like we, then you should take care about every single ressource, because the IT budget is not unlimited, especially if you are using some expensive storage boxes or IBM boxes. :)

    ReplyDelete