Case Study · PostgreSQL Performance Tuning

PostgreSQL 300GB: Query Timeouts Fixed

How we brought a 340GB PostgreSQL database back from chronic query timeouts — reducing the primary reporting query from 4 minutes to 17 seconds, recovering 48GB of disk space, and cutting I/O by 60%.

14x
Faster Queries
48GB
Bloat Reclaimed
60%
Less Disk I/O
PostgreSQL 300GB Performance Tuning — Before and After Query Plans

Background

The Client

An operations platform that had grown steadily over five years to a 340GB PostgreSQL database, holding over 200 million rows across its primary events and activity tables. The system was running on a 32GB RAM server — a setup that had worked fine when the database was smaller, but had never been tuned as data volumes grew.

By the time DharmOps was engaged, the team was dealing with reporting queries that ran for 3–4 minutes and frequently timed out entirely. The database was consuming far more disk I/O than expected given the workload, and the server's storage was filling up despite no significant increase in new data. Indexes existed — but the problem kept getting worse.

The Problem

Four Issues Working Against Each Other

A DharmOps audit revealed the database was degraded by four compounding issues — none of them obvious from application logs, all of them fixable without schema changes or downtime:

  • shared_buffers left at PostgreSQL's 128MB default. PostgreSQL ships with shared_buffers set to 128MB — a conservative default designed for minimal hardware. On this 32GB RAM server, that meant less than 0.4% of available memory was allocated to the database buffer cache. The database was fetching pages from disk on almost every query instead of serving them from memory.
  • Autovacuum misconfigured on high-write tables. PostgreSQL's default autovacuum_vacuum_scale_factor of 0.2 means autovacuum doesn't trigger until 20% of a table's rows are dead. On the primary events table with 180 million rows, that threshold required 36 million dead tuples to accumulate before any cleanup ran. The result: 48GB of unreachable dead data inflating every sequential scan and degrading index efficiency.
  • Sequential scans across the full 180M-row events table. The primary reporting query — filtering by a date range and a status column — was performing full sequential scans on the events table on every execution. The required composite index existed on (status) alone, but the query's date-range filter meant the planner ignored it and scanned the entire table instead.
  • work_mem at 4MB default causing disk-based sort operations. With work_mem set to PostgreSQL's 4MB default, every sort or hash operation in the reporting queries exceeded available memory and spilled to disk — visible in EXPLAIN ANALYZE output as 'Sort Method: external merge Disk: 2048kB'. Each report was triggering multiple disk sorts in addition to the underlying seq scan.

The Solution

What We Fixed

Every change was applied live. No maintenance window, no application downtime, no schema migrations. DharmOps completed the full engagement over four working days.

  • Tuned shared_buffers from 128MB to 8GB, effective_cache_size to 24GB. Raised shared_buffers to 25% of available RAM — the standard starting point for a dedicated PostgreSQL server. Set effective_cache_size to reflect the OS page cache so the query planner makes accurate cost estimates. The database now serves frequently-accessed pages from memory instead of disk on the overwhelming majority of queries.
  • Set work_mem to 128MB for analytical query sessions. Configured work_mem to 128MB at the session level for reporting workloads, eliminating disk spill on sort and hash operations. The EXPLAIN ANALYZE 'external merge Disk' lines disappeared entirely from the reporting queries. For transactional workloads the default 4MB is preserved to avoid memory pressure under concurrency.
  • Reconfigured autovacuum per-table and ran manual VACUUM ANALYZE. Set autovacuum_vacuum_scale_factor to 0.01 on the three largest tables — triggering cleanup after 1% of rows turn dead rather than 20%. Ran a manual VACUUM ANALYZE immediately to reclaim 48GB of dead tuple bloat that had accumulated over months. Index scans on the affected tables now traverse live data only.
  • Created a covering partial index on (created_at, status) for the primary report query. Built a composite index tailored to the reporting access pattern — covering both the date-range filter and the status filter in the correct column order. The index was created CONCURRENTLY with zero table locking. The primary reporting query dropped from a full seq scan of 180M rows to an index scan returning the relevant 50,000-row subset.
  • Partitioned the events table by month using declarative partitioning. Migrated the events table to monthly range partitions, enabling partition pruning for any query that filters by date range. Queries that previously scanned five years of history now scan only the months they need. The reorganisation was completed with a dual-write cutover pattern — old table and new partitioned structure ran in parallel until the switch was confirmed stable.

The Results

Measurable Outcomes

14x
Faster Primary Report
4 min → 17s
Query Execution Time
48GB
Disk Space Recovered

Disk I/O during peak reporting hours dropped by 60% — the server stopped saturating its storage bandwidth, which had been causing latency spikes across unrelated queries. Storage consumption stabilised as autovacuum now keeps up with write volume instead of falling perpetually behind. The team can run ad hoc reports at any time without worrying about timeouts.

None of these changes required application code modifications, schema changes, or a maintenance window. Every fix was applied while the system was live and under its normal production load.

"We were running performance issues on a PostgreSQL database with over 300GB of data. DharmOps analysed the workload, found the bottlenecks, and handled the tuning at a scale that required genuine expertise. The scope of what they delivered at that data volume exceeded what we expected going in."

— Jalal, PostgreSQL Performance Tuning · 300GB+ · Upwork ★ 5.0

Technologies & Tools

PostgreSQLshared_bufferswork_memautovacuumVACUUM ANALYZEEXPLAIN ANALYZEpg_stat_statementsDeclarative PartitioningBRIN IndexesCovering IndexesCONCURRENT INDEX BUILDpg_stat_user_tables

Common Questions

PostgreSQL Performance at Scale

Why is my PostgreSQL database slow with 100GB, 200GB, or 300GB of data?

The most common causes at scale are: shared_buffers left at the 128MB default (meaning PostgreSQL reads almost everything from disk instead of memory); table bloat from misconfigured autovacuum — the default autovacuum_vacuum_scale_factor of 0.2 means 20% of a table's rows must be dead before cleanup runs, so a 180M-row table accumulates 36M dead tuples before any cleanup occurs; sequential scans because the index doesn't cover the query's full filter pattern; and work_mem at 4MB, causing every sort and hash operation to spill to disk.

What should shared_buffers be set to for a large PostgreSQL database?

25–40% of total RAM. PostgreSQL's default of 128MB is severely undersized for production workloads. A 32GB RAM server should use 8GB–12GB. Also set effective_cache_size to 50–75% of RAM so the query planner accurately estimates the cost of index scans versus sequential scans.

How do I fix autovacuum for a large PostgreSQL table?

Set autovacuum_vacuum_scale_factor to 0.01 at the table level for large tables. The default of 0.2 waits until 20% of rows are dead — on a 200M-row table that's 40M dead tuples. Use ALTER TABLE t SET (autovacuum_vacuum_scale_factor = 0.01, autovacuum_analyze_scale_factor = 0.005). Then run VACUUM ANALYZE manually to reclaim existing bloat immediately.

How do I stop PostgreSQL reports from timing out on a large database?

Four changes address the most common causes: raise shared_buffers to 25% of RAM; increase work_mem for reporting sessions (SET work_mem = '128MB') to eliminate disk sorts; create a partial composite index covering the full WHERE clause of your primary reporting query; and fix autovacuum so bloat stops inflating sequential scan costs. Table partitioning by time range provides a further 50–60% improvement for queries that only need recent data.

Can PostgreSQL handle a 300GB database, or do I need to migrate to a different database?

PostgreSQL handles 300GB and multi-terabyte databases well when tuned correctly. Most problems at this scale are configuration issues — not architectural limits. Migration to a different database is rarely the right answer before shared_buffers, autovacuum, and indexing have been addressed. Table partitioning is the right structural step for very large tables and can be done live without downtime.

Large PostgreSQL Database Slowing Down?

Tell us about your database — size, symptoms, what you've already tried. We'll identify the root cause and send you a clear action plan, usually within 48 hours.