Database Backup and Recovery Best Practices: PITR, RTO, RPO & Tools

Database backups are the last line of defence between a catastrophic data loss event and business continuity. Yet most organisations discover the weaknesses in their backup strategy during a recovery attempt — not before it. A backup that has never been tested is not a backup; it is a theoretical safety net. This guide covers everything a DBA, DevOps engineer, or IT manager needs to build a robust, tested backup and recovery strategy from the ground up. We walk through the three core backup types and when to use each, how to define RTO and RPO targets that match your business requirements, how to configure Point-in-Time Recovery in PostgreSQL, the three leading PostgreSQL backup tools compared side by side, backup automation strategies, the critical discipline of testing recovery procedures, cloud backup architectures, disaster recovery planning, and compliance retention requirements. Every section includes specific configuration examples you can apply to your own environment today.

Backup Fundamentals: Full, Incremental, and Differential

Every database backup strategy is built on three primitive backup types, and understanding each type's trade-offs is the foundation of a sound recovery plan. A full backup is a complete copy of the entire database at a specific point in time. It is the simplest to restore from — recovery requires only the single backup file — but it is the most expensive in storage space and the most time-consuming to produce. For a 2TB database, a full backup takes hours and generates 2TB of backup data. An incremental backup captures only the data that changed since the previous backup, whether that previous backup was a full or another incremental. Incremental backups are fast and storage-efficient but create a recovery dependency chain: restoring requires the last full backup plus every incremental since then, applied in sequence. A differential backup captures all changes since the last full backup — it is larger than an incremental but simpler to restore, since you only ever need the most recent full plus the most recent differential. Most production backup strategies combine weekly full backups with daily incremental or differential backups, supplemented by continuous WAL archiving for point-in-time recovery capability between scheduled backup events.

RTO and RPO: Define Your Recovery Targets First

Before choosing backup tools or schedules, you must define your recovery targets — because those targets determine every technical decision that follows. Recovery Point Objective (RPO) defines the maximum acceptable data loss measured in time. An RPO of one hour means you can afford to lose at most one hour of transactions in a disaster scenario; your backup frequency and WAL archiving must ensure a one-hour-old recovery point is always available. Recovery Time Objective (RTO) defines the maximum acceptable time to restore service after a failure. An RTO of four hours means your full restoration process — data transfer, database startup, validation, and application failover — must complete within four hours. These two numbers are set by business stakeholders, not engineering teams, because they represent business impact tolerance: the cost of data loss or downtime per hour, weighed against the cost of the infrastructure needed to prevent it. Document your agreed RPO and RTO targets explicitly, then verify through scheduled timed recovery tests that your backup strategy actually achieves them — not through theoretical estimates made at planning time.

Point-in-Time Recovery (PITR) Explained

Point-in-Time Recovery is the most powerful recovery capability a production database can have. PITR allows you to restore a database to any specific moment — not just to the last backup timestamp, but to any second between your oldest retained backup and the most recently archived WAL segment. This is essential for recovering from logical data corruption events: if a developer accidentally executed DELETE FROM orders WHERE 1=1 at 2:15 PM, PITR lets you restore the database to its exact state at 2:14 PM and recover all deleted data without losing any other transactions from the rest of the day. PostgreSQL implements PITR through Write-Ahead Logging. Every database change is first written to the WAL before being applied to data files. By continuously archiving WAL segments to durable offsite storage and combining them with a periodic base backup, you can replay the WAL stream forward from any backup timestamp to any target recovery time. PITR requires three components working in concert: a base backup, a continuously maintained WAL archive, and a recovery configuration specifying the target recovery time.

-- postgresql.conf: enable WAL archiving
wal_level = replica
archive_mode = on
archive_command = 'wal-g wal-push %p'    -- or: cp %p /backup/wal/%f

-- Take a base backup (the starting point for PITR)
pg_basebackup -h localhost -U replicator -D /backup/base -Ft -z -P

-- postgresql.conf: configure recovery target before starting
restore_command = 'wal-g wal-fetch %f %p'
recovery_target_time = '2026-02-18 14:14:00 UTC'
recovery_target_action = 'promote'

-- Signal PostgreSQL to start in recovery mode
touch /var/lib/postgresql/data/recovery.signal
pg_ctl start -D /var/lib/postgresql/data

pg_basebackup: PostgreSQL's Built-in Backup Tool

pg_basebackup is the official, built-in utility for creating physical base backups of a running PostgreSQL instance. It connects using the replication protocol and streams a consistent copy of all data files while the database remains fully online — no maintenance window required. pg_basebackup requires a database user with REPLICATION privilege and pg_hba.conf configured to allow replication connections from the backup host. For single-server setups or small to medium databases, it is the simplest starting point: no additional software is required, and the compressed output can be stored locally or transferred to offsite storage using standard file transfer tools. Its limitations are worth understanding: pg_basebackup always creates full backups with no incremental support, it does not manage WAL archiving automatically, and it has no built-in retention management or backup catalogue. For databases up to a few hundred gigabytes on a weekly full backup schedule with separate WAL archiving, pg_basebackup combined with a straightforward shell script and cron job is a fully adequate, battle-tested production backup solution.

# Create a compressed, streamed base backup with progress reporting
pg_basebackup \
  --host=localhost \
  --username=replicator \
  --pgdata=/backup/$(date +%Y%m%d_%H%M%S) \
  --format=tar \
  --gzip \
  --compress=9 \
  --checkpoint=fast \
  --progress \
  --verbose

# Verify the backup archive is intact and readable
tar -tzf /backup/20260218_020000/base.tar.gz | head -20

WAL-G: Production-Grade Continuous Archiving

WAL-G is the modern, production-grade backup solution for PostgreSQL, originally developed by Citus Data and now widely adopted across the ecosystem. It handles both base backups and continuous WAL archiving in a single tool, with native integration for cloud object storage — Amazon S3, Google Cloud Storage, Azure Blob Storage, and any S3-compatible endpoint including MinIO for on-premises deployments. WAL-G supports delta incremental backups that copy only changed data file pages since the last base backup, significantly reducing backup duration and cloud storage costs for large databases. It compresses backups using lz4 or brotli by default, supports AES and GPG encryption at rest, and provides parallel upload and download streams for fast operation on high-bandwidth connections. Restoring with WAL-G is a single command that downloads the appropriate base backup and automatically fetches and replays WAL segments to the specified recovery target time. WAL-G's backup catalogue tracks metadata for all available backups, making retention enforcement, recovery point listing, and the full backup lifecycle manageable from a cron job or Kubernetes CronJob with no additional tooling.

# Configure WAL-G via environment variables (or .walg.json config file)
export WALG_S3_PREFIX=s3://your-backup-bucket/postgres
export AWS_REGION=us-east-1
export WALG_COMPRESSION_METHOD=lz4
export WALG_DELTA_MAX_STEPS=6    # up to 6 incremental steps per full backup

# postgresql.conf: delegate WAL archiving to WAL-G
# archive_command = 'wal-g wal-push %p'
# restore_command = 'wal-g wal-fetch %f %p'

# Take a base backup (delta if a recent full exists; full otherwise)
wal-g backup-push /var/lib/postgresql/data

# List all available backups with sizes and timestamps
wal-g backup-list --detail

# Restore latest backup and replay WAL to target time
wal-g backup-fetch /var/lib/postgresql/data LATEST

# Enforce retention: keep at minimum 3 full backups, delete the rest
wal-g delete retain FULL 3 --confirm

Barman: Centralised Management for Multiple Servers

Barman (Backup and Recovery Manager) is an open-source backup management platform from EDB designed for teams managing multiple PostgreSQL servers from a single centralised location. Unlike pg_basebackup and WAL-G — which run on or close to the database server itself — Barman runs on a dedicated backup server and pulls backups from your PostgreSQL instances over the network using the streaming replication protocol or rsync. This centralised model is particularly valuable for organisations managing tens or hundreds of PostgreSQL instances: backup metadata, schedules, retention policies, and recovery points for all servers are visible and manageable from a single control plane. Barman supports both streaming replication and rsync-based backup methods, provides a rich command-line interface for backup status checks, integrity validation, and point-in-time recovery orchestration, and integrates with monitoring and alerting systems through its check command exit codes. Its backup catalogue makes it straightforward to answer critical operational questions: when was the last successful backup for a specific server, what is the earliest available recovery point, and which backups are due for expiration under the configured retention policy.

# /etc/barman.conf: configure a PostgreSQL server to manage
[postgres-prod]
description = "Production PostgreSQL Server"
conninfo = host=db-prod user=barman dbname=postgres
streaming_conninfo = host=db-prod user=streaming_barman
backup_method = postgres
streaming_archiver = on
slot_name = barman
retention_policy = RECOVERY WINDOW OF 7 DAYS
minimum_redundancy = 2

# Barman operational commands
barman check postgres-prod            # Validate connectivity and configuration
barman backup postgres-prod           # Trigger an on-demand full backup
barman list-backups postgres-prod     # List all catalogued backups
barman recover postgres-prod LATEST /restore/path \
  --target-time "2026-02-18 14:14:00"

Backup Automation: Schedule, Monitor, and Alert

A backup strategy that relies on manual execution is not a backup strategy — it is a process that will eventually be skipped during an incident, a holiday, a team transition, or simply a busy week. Every production database backup must be fully automated and monitored for failure with the same urgency you apply to production traffic monitoring. The simplest automation for smaller environments is a cron job or systemd timer that triggers the backup tool, uploads output to cloud storage, and sends an active notification — success or failure — to a monitoring channel or on-call system. For Kubernetes-based environments, a CronJob resource manages the same workflow with restart policies and resource limits. The critically overlooked element is monitoring backup success rather than just scheduling the backup command: a cron job that fails silently for two weeks leaves you with no valid backups precisely when you most need them. Send backup metrics — duration, size, and time elapsed since last successful completion — to your observability platform and alert when any metric falls outside expected bounds.

# cron: full backup at 2 AM daily (WAL archiving runs continuously)
0 2 * * * wal-g backup-push /var/lib/postgresql/data \
  && echo "Backup OK $(date)" >> /var/log/pg-backup.log \
  || (echo "BACKUP FAILED $(date)" | mail -s "DB Backup Failed" ops@company.com)

# Prune old backups after the retention window each morning
0 3 * * * wal-g delete retain FULL 7 --confirm >> /var/log/pg-backup-cleanup.log 2>&1

# Systemd timer (more reliable than cron — catches missed runs after downtime)
# /etc/systemd/system/pg-backup.timer
# [Timer]
# OnCalendar=02:00:00
# Persistent=true

Testing Recovery Procedures: The Most Critical Step

The single most dangerous assumption in backup management is that a backup is valid because the backup tool reported success. Backup tools can produce corrupted, incomplete, or unrestorable output without triggering a non-zero exit code. The only way to know your backup is recoverable is to restore it to a separate server and verify the data. Recovery testing must be a scheduled, documented operational process — not something that happens only during an actual emergency, by which point discovering a broken backup strategy is a catastrophe rather than a near miss. At minimum, restore your most recent full backup to an isolated test server monthly, start the database, run validation queries against known data points, and confirm it reaches a consistent state without errors. For PITR-enabled environments, additionally test a point-in-time restore to a specific historical timestamp and verify the expected data state at that moment. Quarterly, simulate a full disaster recovery scenario by following your documented runbook from scratch, measure the actual end-to-end recovery time, and compare it against your defined RTO target.

# Monthly recovery validation — run on an isolated test environment

# 1. Restore the latest backup (never to the production data directory)
wal-g backup-fetch /tmp/pg-restore-test LATEST

# 2. Start PostgreSQL on the test server
pg_ctl start -D /tmp/pg-restore-test -l /tmp/pg-restore.log

# 3. Run data integrity validation
psql -d restored_db -c "SELECT COUNT(*) FROM orders;"
psql -d restored_db -c "SELECT MAX(created_at) FROM orders;"
psql -d restored_db -c "SELECT pg_database_size(current_database());"

# 4. Record: restore duration, latest record timestamp, any errors
# If actual restore time > RTO, revise your strategy before the next test

Cloud Backup Solutions

Cloud object storage has become the standard destination for production database backups in 2026 because it provides virtually unlimited capacity, eleven-nines durability guarantees, built-in geographic redundancy, and automated lifecycle policies for retention management — all at a cost significantly below equivalent on-premises tape or disk backup infrastructure. Amazon S3, Google Cloud Storage, and Azure Blob Storage all offer multi-region replication for mission-critical backup repositories. Key configurations to enforce: enable object versioning to protect against accidental overwrite or deletion of backup objects; configure lifecycle rules to automatically expire objects older than your retention window; use server-side encryption with customer-managed keys for databases containing sensitive or regulated data; and enable cross-region replication for disaster recovery scenarios where the primary cloud region becomes unavailable. Fully managed database services — AWS RDS, Google Cloud SQL, and Azure Database for PostgreSQL — include automated backup management as a native feature, handling PITR storage, retention enforcement, and cross-region replication with no additional tooling required. For self-managed PostgreSQL instances, WAL-G's native cloud storage integration is the most direct path to cloud-backed, production-grade PITR.

Disaster Recovery Planning

A backup is a technical artefact. A disaster recovery plan is the documented process that turns that backup into a running, validated database when everything else has failed. These are fundamentally different things, and many teams have the first without the second. A complete disaster recovery plan documents: the decision chain for declaring a recovery event and authorising a restore; the exact, ordered commands to execute for each failure scenario — single table corruption, full server failure, and data centre outage; where backup credentials and encryption keys are stored and who holds access to them during an incident; the validation procedure for confirming the restored database is correct before routing production traffic to it; the stakeholder communication plan for incident status and expected recovery timeline; and the post-incident review process to address root cause and prevent recurrence. Store the disaster recovery runbook somewhere accessible when your primary infrastructure is offline — a printed copy in a secure physical location, a separate Git repository hosted outside your primary cloud account, or a document in a cloud platform that does not depend on your own infrastructure. Review and update the runbook every time your backup infrastructure changes, not only when something goes wrong.

Compliance and Retention Policies

Backup retention is not solely a technical decision — for many organisations it is a compliance obligation governed by regulation, and the required retention period varies significantly by industry and jurisdiction. GDPR requires that personal data not be retained beyond its lawful processing purpose, which means your backup retention policy must address how data subject deletion requests are handled in backup archives — because restoring a backup after a deletion request could re-introduce data that was lawfully erased. HIPAA mandates that protected health information be retained for a minimum of six years. PCI DSS requires that cardholder data environment audit trails be maintained for 12 months with the most recent three months immediately available. SOC 2 auditors will ask for evidence that your backup and recovery controls are implemented, tested, and documented on a regular schedule. The retention policy you implement must satisfy the maximum retention requirement across all applicable regulations while also satisfying your operational minimum needed to meet your RPO in a worst-case recovery scenario. Automate retention enforcement through cloud storage lifecycle rules or your backup tool's retention management, and generate auditable evidence — backup completion logs, recovery test records, retention enforcement confirmations — that you can present to auditors without manual reconstruction.

A robust database backup and recovery strategy is not a one-time configuration task — it is an ongoing operational discipline that demands the same sustained attention as production monitoring and incident response. Define your RPO and RTO targets with business stakeholders before selecting tools or schedules, because those targets determine every technical choice that follows. Implement continuous WAL archiving alongside periodic base backups to achieve true point-in-time recovery capability. Choose your backup tooling based on operational scale and team capability: pg_basebackup for simplicity on smaller databases, WAL-G for production-grade cloud archiving with delta incremental support, or Barman for centralised management across multiple PostgreSQL servers. Automate every backup job, monitor for failure with active alerting, and treat a missed or failed backup with the same urgency as a production incident. Test recovery procedures on a documented schedule, measure actual end-to-end recovery time against your RTO, and update your disaster recovery runbook whenever infrastructure changes. The measure of a backup strategy is not how reliably backups are created — it is how quickly and completely you can restore from them when the moment arrives.

Need Expert Database Guidance?

Book a free 30-minute diagnostic call. Whether you are debugging slow queries, evaluating databases, or planning a migration — we will give you specific, actionable recommendations, not generic advice.

BOOK FREE DIAGNOSTIC