วันจันทร์ที่ 13 ธันวาคม พ.ศ. 2553

Postgresql เมื่อ index bloat (บวม)

เมื่อเราใช้ Postgres ไปนาน ๆ แล้ว สิ่งที่ติดตามมาคือ
  1. จำนวน record ที่เพิ่มมากขึ้น
  2. ขนาดของ index ที่มากขึ้น
  3. ขนาดของ dead rows ที่มากขึ้น
  4. ค่า free space map ดูว่าจะไม่พอแล้ว ในเวอร์ชั่นที่ต่ำกว่า 8.4
  5. มันสายไปแล้วถ้าคุณลืมเปิด autovacumm

หลาย ๆ คน อาจต้องยอมทำ VACUUM FULL แต่ปัญหาก็ตามมาคือ ระบบผมหยุดไม่ได้!


ส่ิงที่ต้องรีบดำเนินการโดยด่วนคือ
  1. เพิ่ม free space map
  2. เปิด auto vacuum
  3. จากเดิมที่ใช้ DELETE TABLE tablename; ก็เปลี่ยนเป็น TRUNCATE TABLE tablename;

อ้างอิงจากอันนี้ http://gkoenig.wordpress.com/2009/11/24/postgres-and-bloated-table/

postgres and bloated table
November 24, 2009 in open source, PostgreSQL
From time to time there are news/messages about bloated tables in postgres and a thereby decreased performance of the database.
The postgres-wiki contains a view (extracted from a script of the bucardo project) to check for bloat in your database here
For a quick reference you can check your table/index sizes regularly and check the no. of tuples to assume where bloat comes in.
e.g. check size of your tables and indexes:

SELECT relname AS table_name,
pg_size_pretty(pg_relation_size(oid)) AS table_size,
pg_size_pretty(pg_total_relation_size(oid)) AS total_size
FROM pg_class
WHERE relkind in ('r','i')
ORDER BY pg_relation_size(oid) DESC;

check no. of tuples

SELECT relname, relkind, reltuples, relpages
FROM pg_class
ORDER BY relpages DESC;

If you’re facing such a situation you have to check your (auto)vacuum settings and if it’s working correctly.
Starting with Postgres 8.3.x the autovacuum daemon works reliable for most cases and should be enabled. On some workloads a manual vacuum is needed anyhow. In combination with setting the (auto)vacuum parameters you have to consider the Free Space Map (FSM) parameter.
Open your postgresql.conf and check it out (every parameter has a very good comment there):
* enable autovacuum and log every run > 1 sec.

autovacuum = on
log_autovacuum_min_duration = 1000

fine tune when autovacuum will check the tables. The number is the fraction of the table size which needs modified data to run an autovacuum.

Apart from the bloated-table/index stuff, for the query planner it is very important to have up-to-date statistics. If you need an analyze more often, shrink the number to e.g. autovacuum_analyze_scale_factor = 0.02 and the tables will be autoanalyzed more often

autovacuum_vacuum_scale_factor = 0.2
autovacuum_analyze_scale_factor = 0.1

prevent autovacuum from running too often

autovacuum_naptime = 1min #time between two runs

And don’t forget to check your postgresql.log file for any hints the database system provides.
Regarding bloated tables I detected something like:

WARNING: relation "public.table_one" contains more than "max_fsm_pages" pages with useful free space
HINT: Consider using VACUUM FULL on this relation or increasing the configuration parameter "max_fsm_pages"

A “normal” vacuum wouldn’t help in this case because it doesn’t release the pages to the os, they will only be marked as free in the FSM (Free Space Map).
If your FSM isn’t big enough your tables will be bloated even with running autovacuum and perhaps manual vacuum, because the pages with no longer needed data cannot be marked as free and therefore never be overwritten.
One solution would be to run a

vacuum full verbose analyze table_name

This “defragment”-command marks unused pages as “can be overwritten” and moves pages from the bottom into this marked pages. => The table size will shrink
! But be aware: vacuum full locks the table exclusive. Consider running this command in maintenance windows or during night.

You should also keep in mind to “defragment” bloated indexes with

REINDEX INDEX indexname # recreate index explicitly
REINDEX TABLE tablename # recreate all indexes for this table