Performance Tuning Postgresql 9.5

PostgreSQL is a solid datbase and functions very well under heavy load, it's critical to understand the load on your paticular system. There are many possible settings that can be tweaked to improve the peformance of PostgreSQL, and the sytem in general. A subset of those settings are mentioned here, along with pgbench and how it can be used to assist with tuning. It should be noted that there is no "magic bullet" here, performance is a concept and the system may require multiple tunings as it grows.

Performance Hardware

Database processes do complicated operations on lots of data. Complicated operations (like comparing, grouping, hashing, joining, sorting, etc) require lots of processor time. Give your database lots of processor juice, dual processor minimum, high end of the price/performance ration (most GHz for least USD). Lots of data requires lots of storage, lots of operations means lots of reading and writing. This means 36 GB (minimum) SCSI on 180 or 320 bus; use 15K RPM if possible. Of course storing lots of data means, operating on lots of data. Operations take place in memory so there had better be enough, swapping will degrade performance. At the time of this writing a solid database machine for moderate load was a dual 3.0 GHz, 4GB RAM and hardware mirrored 73GB/SCSI320/15K.

Databases have a tendency to store mission critical data, so one needs to take precautions when storing. The first line of defence (after quality hardware vendors) is the storage array or RAID. The reader is expected to know what RAID0, RAID1 and RAID5 are. RAID0 is simply not suitable, do not use this for a database filesystem. RAID1 is is favoured by many over RAID5 for the datbase system, the arguments go something like: parallel writes vs read+parity-calc+write. The options that many consider the best is RAID10 which has the reliabilty of RAID1 with performance like RAID0. RAID10 highly reccomended even though it is the most expensive option.

Performance Software

The choice of performance hardware will be wasted if the software cannot or will not use it. To this end the operating system should have some performance tuning done to it if possible, an entire discussion itself. PostgreSQL is highly configurable and has many options to improve its performance, obviously consuming resources to do so. Of course hardware, OS and RDBMS tuning will be useless if the database schema and design are not also performance oriented.

Tune the Operating System

All this relates to Linux, not specific to any distribution.

Shared Memory

PostgreSQL uses lots of this, view ipcs to prove it, the more shared memory the better as more data (tables) can be loaded. On a dedicated datbase server it's not uncommon to give half the memory to the database. The shared memory maximum value can be set in /proc/sys/kernel/shmmax. Say echo $((`cat /proc/sys/kernel/shmmax` / 1048576)) to see how many megabytes are permitted for shared memory. Set this to half of the physical memory (or some reasonable value) echo $((1024 * 1024 * 1024)) > /proc/sys/kernel/shmmax, thats 1G. PostgreSQL will also have to be told to use this, see below.

File Atime

Many of the file system choices have a file access time or atime attribute on the files. A slight performance increase can be found by turning this feature off for the database file system. For ext3 the entry in /etc/fstab should look something like:

/dev/sda4    /var    ext3    noatime    0    0

Tune PostgreSQL Performance

The values and suggestions expressed here are just that, it's basic rules that Edoceo uses, YMMV. As always one should experiment with what values work best for your environment.

max_connections = N

Set this to 140% of the average number of expected connections. 100 clients average means 140 max connections.

shared_buffers = N

Set anywhere from 1/4 to 1/2 physical memory, must set kernel shared memory max first. Will see noticeable difference. These are calculated as 8K blocks so 256M would be 32768 = 1024*1024*256/8192.

fsync = true|false

Setting this to false will speed up the file system but crashes or unexpeced stop will require a restore from backup, keep as fsync=true

work_mem = N

Default is 1M but 2M doesn't hurt, this can also be set per connection via SET command which is reccomended.

effective_cache_size = N

The assumption about cached files by the operating system, look in /proc/sys/fs/file-nr and file-max for clues.

log_statement = 'none'

On production system, where it can be avoided, turn off statement logging and minimize the other logging parameters.

syslog_facility = 'LOCAL2'
syslog_ident = 'postgres'
log_connections = false
log_disconnections = false
log_duration = false
log_statement = 'none'

autovacuum = off

Autovacuum can be disabled in postgresql.conf to slightly reduce the load there, vacuum will need to be manually run!! See 8.3 Auto-Vacuum.

Performance Schema and Design

Performance is also tied to the schema and design of the system and to the data that is stored. Clearly larger data sets will take longer, even with high-performance hardware and software. Keep in mind that varchar(50) primary keys with unique indexes will not perform as well as char(10). Both will be slower than integer based primary keys. Don't index columns which are never searched on, carefully choose the columns that are indexed.

ChangeLog