Free Resource

Database Health Checklist

42 items covering performance, security, reliability, and operations. Run through this checklist quarterly—or before any major release.

Critical — fix immediately

Important — fix within 30 days

Best practice — schedule soon

Performance

No queries running longer than 1 second in pg_stat_statements / sys.dm_exec_query_stats
Critical
Cache hit ratio above 95% (shared_buffers / buffer pool usage)
Critical
No sequential scans on tables over 100K rows for frequent queries
Critical
All foreign key columns have a corresponding index
Important
No index bloat above 30% on frequently updated tables
Important
Autovacuum / auto-statistics update running successfully on all tables
Important
Connection pool utilisation below 80% during peak hours
Important
No N+1 query patterns detected in application query logs
Best Practice
Covering indexes in place for top 10 most frequent query patterns
Best Practice
Partitioning strategy in place for tables over 50M rows
Best Practice

No default vendor accounts (SYS, SA, postgres superuser) accessible from application
Critical
All database connections encrypted in transit (TLS 1.2+)
Critical
No application account has DBA / superuser privileges
Critical
Data at rest encrypted (TDE on SQL Server / Oracle, pgcrypto or filesystem encryption on PostgreSQL)
Critical
All user accounts follow principle of least privilege
Critical
Audit logging enabled for all DDL and privileged DML operations
Important
Database accessible only from application tier — not directly from internet
Critical
Password policy enforced: minimum length, complexity, and expiry
Important
No hard-coded credentials in application code or automation scripts
Critical
Database patched to latest minor version within 90 days of release
Important
Unused database accounts disabled or removed
Important
Row-level security or Oracle VPD in place where multi-tenant data exists
Best Practice

Full backup taken daily and verified restorable (restore test performed monthly)
Critical
Transaction log / WAL backups taken every 15 minutes or less
Critical
Backup retention policy defined and enforced (minimum 30 days)
Critical
Backups stored offsite or in a separate cloud region from production
Critical
Recovery Time Objective (RTO) and Recovery Point Objective (RPO) documented
Important
Replication lag below 10 seconds on all replicas
Important
Failover procedure documented and tested in the last 6 months
Important
Disk space usage below 75% with growth projections reviewed quarterly
Important
Tablespace / filegroup usage monitored and alerted
Best Practice
Database health check alerts sent to on-call channel (not just email)
Best Practice

Index rebuild / reorganise schedule in place for fragmented indexes
Important
Statistics updated automatically and verified current
Important
Bloated tables identified and VACUUM FULL / shrink schedule in place
Best Practice
Long-running transactions monitored and killed after threshold (e.g. 30 min)
Important
Blocking locks alerted in real-time
Important
Database version documented and EOL date tracked
Best Practice
Schema change process (migrations) peer-reviewed before production deployment
Best Practice
Capacity planning reviewed quarterly against growth trend
Best Practice
Database parameter/configuration baseline documented
Best Practice
Runbook available for common incident scenarios (high CPU, lock contention, disk full)
Important

Book a free diagnostic call. We'll review your specific findings and provide a prioritised remediation plan.