Free Resource
Database Health Checklist
42 items covering performance, security, reliability, and operations. Run through this checklist quarterly—or before any major release.
Critical — fix immediately
Important — fix within 30 days
Best practice — schedule soon
Performance
- No queries running longer than 1 second in pg_stat_statements / sys.dm_exec_query_statsCritical
- Cache hit ratio above 95% (shared_buffers / buffer pool usage)Critical
- No sequential scans on tables over 100K rows for frequent queriesCritical
- All foreign key columns have a corresponding indexImportant
- No index bloat above 30% on frequently updated tablesImportant
- Autovacuum / auto-statistics update running successfully on all tablesImportant
- Connection pool utilisation below 80% during peak hoursImportant
- No N+1 query patterns detected in application query logsBest Practice
- Covering indexes in place for top 10 most frequent query patternsBest Practice
- Partitioning strategy in place for tables over 50M rowsBest Practice
Security
- No default vendor accounts (SYS, SA, postgres superuser) accessible from applicationCritical
- All database connections encrypted in transit (TLS 1.2+)Critical
- No application account has DBA / superuser privilegesCritical
- Data at rest encrypted (TDE on SQL Server / Oracle, pgcrypto or filesystem encryption on PostgreSQL)Critical
- All user accounts follow principle of least privilegeCritical
- Audit logging enabled for all DDL and privileged DML operationsImportant
- Database accessible only from application tier — not directly from internetCritical
- Password policy enforced: minimum length, complexity, and expiryImportant
- No hard-coded credentials in application code or automation scriptsCritical
- Database patched to latest minor version within 90 days of releaseImportant
- Unused database accounts disabled or removedImportant
- Row-level security or Oracle VPD in place where multi-tenant data existsBest Practice
Reliability & Backups
- Full backup taken daily and verified restorable (restore test performed monthly)Critical
- Transaction log / WAL backups taken every 15 minutes or lessCritical
- Backup retention policy defined and enforced (minimum 30 days)Critical
- Backups stored offsite or in a separate cloud region from productionCritical
- Recovery Time Objective (RTO) and Recovery Point Objective (RPO) documentedImportant
- Replication lag below 10 seconds on all replicasImportant
- Failover procedure documented and tested in the last 6 monthsImportant
- Disk space usage below 75% with growth projections reviewed quarterlyImportant
- Tablespace / filegroup usage monitored and alertedBest Practice
- Database health check alerts sent to on-call channel (not just email)Best Practice
Maintenance & Operations
- Index rebuild / reorganise schedule in place for fragmented indexesImportant
- Statistics updated automatically and verified currentImportant
- Bloated tables identified and VACUUM FULL / shrink schedule in placeBest Practice
- Long-running transactions monitored and killed after threshold (e.g. 30 min)Important
- Blocking locks alerted in real-timeImportant
- Database version documented and EOL date trackedBest Practice
- Schema change process (migrations) peer-reviewed before production deploymentBest Practice
- Capacity planning reviewed quarterly against growth trendBest Practice
- Database parameter/configuration baseline documentedBest Practice
- Runbook available for common incident scenarios (high CPU, lock contention, disk full)Important
Found Issues in Your Checklist?
Book a free diagnostic call. We'll review your specific findings and provide a prioritised remediation plan.
BOOK FREE DIAGNOSTIC