PostgreSQL is a solid datbase and functions very well under heavy load, it's critical to understand the load on your paticular system. There are many possible settings that can be tweaked to improve the peformance of PostgreSQL, and the sytem in general. A subset of those settings are mentioned here, along with pgbench and how it can be used to assist with tuning. It should be noted that there is no "magic bullet" here, performance is a concept and the system may require multiple tunings as it grows.
Database processes do complicated operations on lots of data. Complicated operations (like comparing, grouping, hashing, joining, sorting, etc) require lots of processor time. Give your database lots of processor juice, dual processor minimum, high end of the price/performance ration (most GHz for least USD). Lots of data requires lots of storage, lots of operations means lots of reading and writing. This means 36 GB (minimum) SCSI on 180 or 320 bus; use 15K RPM if possible. Of course storing lots of data means, operating on lots of data. Operations take place in memory so there had better be enough, swapping will degrade performance. At the time of this writing a solid database machine for moderate load was a dual 3.0 GHz, 4GB RAM and hardware mirrored 73GB/SCSI320/15K.
Databases have a tendency to store mission critical data, so one needs to take precautions when storing. The first line of defence (after quality hardware vendors) is the storage array or RAID. The reader is expected to know what RAID0, RAID1 and RAID5 are. RAID0 is simply not suitable, do not use this for a database filesystem. RAID1 is is favoured by many over RAID5 for the datbase system, the arguments go something like: parallel writes vs read+parity-calc+write. The options that many consider the best is RAID10 which has the reliabilty of RAID1 with performance like RAID0. RAID10 highly reccomended even though it is the most expensive option.
The choice of performance hardware will be wasted if the software cannot or will not use it. To this end the operating system should have some performance tuning done to it if possible, an entire discussion itself. PostgreSQL is highly configurable and has many options to improve its performance, obviously consuming resources to do so. Of course hardware, OS and RDBMS tuning will be useless if the database schema and design are not also performance oriented.
Tune the Operating System
All this relates to Linux, not specific to any distribution.
- Shared Memory
PostgreSQL uses lots of this, view
ipcsto prove it, the more shared memory the better as more data (tables) can be loaded. On a dedicated datbase server it's not uncommon to give half the memory to the database. The shared memory maximum value can be set in
echo $((`cat /proc/sys/kernel/shmmax` / 1048576))to see how many megabytes are permitted for shared memory. Set this to half of the physical memory (or some reasonable value)
echo $((1024 * 1024 * 1024)) > /proc/sys/kernel/shmmax, thats 1G. PostgreSQL will also have to be told to use this, see below.
- File Atime
Many of the file system choices have a file access time or atime attribute on the files.
A slight performance increase can be found by turning this feature off for the database file system.
For ext3 the entry in
/etc/fstabshould look something like:
/dev/sda4 /var ext3 noatime 0 0
Tune PostgreSQL Performance
The values and suggestions expressed here are just that, it's basic rules that Edoceo uses, YMMV. As always one should experiment with what values work best for your environment.
- max_connections = N
- Set this to 140% of the average number of expected connections. 100 clients average means 140 max connections.
- shared_buffers = N
- Set anywhere from 1/4 to 1/2 physical memory, must set kernel shared memory max first. Will see noticeable difference. These are calculated as 8K blocks so 256M would be 32768 = 1024*1024*256/8192.
- fsync = true|false
- Setting this to false will speed up the file system but crashes or unexpeced stop will require a restore from backup, keep as fsync=true
- work_mem = N
- Default is 1M but 2M doesn't hurt, this can also be set per connection via SET command which is reccomended.
- effective_cache_size = N
- The assumption about cached files by the operating system, look in /proc/sys/fs/file-nr and file-max for clues.
- log_statement = 'none'
- On production system, where it can be avoided, turn off statement logging and minimize the other logging parameters.
syslog_facility = 'LOCAL2' syslog_ident = 'postgres' log_connections = false log_disconnections = false log_duration = false log_statement = 'none'
- autovacuum = off
- Autovacuum can be disabled in postgresql.conf to slightly reduce the load there, vacuum will need to be manually run!! See 8.3 Auto-Vacuum.
Performance Schema and Design
Performance is also tied to the schema and design of the system and to the data that is stored. Clearly larger data sets will take longer, even with high-performance hardware and software. Keep in mind that varchar(50) primary keys with unique indexes will not perform as well as char(10). Both will be slower than integer based primary keys. Don't index columns which are never searched on, carefully choose the columns that are indexed.
- PostgreSQL Benchmarking
- PostgreSQL Load Analysis
- Tuning Your PostgreSQL Server
- Descriptions of RAID - very good
- PostgreSQL Wiki on Performance
- On RevSys
- What LinuxJournal says about PostgreSQL
- 14 Jan 2006 - Created /djb