Case Study · High Availability Architecture

Healthcare Platform: 99.99% Uptime

A HIPAA-covered telemedicine platform with a 4-hour manual failover process and a single datacenter. We rebuilt their database infrastructure so a full region outage becomes a 28-second event patients never notice.

99.99%
Uptime SLA
<30s
Automatic Failover
3
Geographic Regions
Healthcare High Availability Architecture Case Study

Background

The Client

A HIPAA-covered telehealth company serving 120,000 registered patients across three US states. Their platform handled patient record management, appointment scheduling, real-time video consultations, and prescription routing — all backed by a single-region PostgreSQL cluster hosted on AWS in us-east-1.

They were preparing for a state-level contract that required them to demonstrate a 99.99% uptime SLA and documented disaster recovery capability. Their existing architecture could not meet that standard. A single datacenter failure would take patient records offline for 4+ hours while engineers manually restored from backup.

The Problem

Where the Architecture Was Failing

The DharmOps architecture review identified four critical gaps — any one of which could have resulted in a HIPAA-reportable breach of availability:

  • Single-region PostgreSQL with no standby replica. The entire platform ran on a single RDS Primary instance in us-east-1 with no hot standby. Any hardware failure, AZ outage, or AWS incident would immediately take the platform offline with no automatic recovery path.
  • Manual disaster recovery process taking 4+ hours. Their documented DR procedure involved an engineer manually restoring from an S3 backup, reconfiguring DNS, and restarting application servers. In a real incident, under pressure and at 3am, this process had never been tested end-to-end.
  • No cross-region data replication. All backups were stored in the same AWS region as production. A region-level event — such as the AWS us-east-1 disruptions of 2021 and 2023 — would take down both production and the backup simultaneously.
  • HIPAA audit log gaps during failover events. In a manual failover scenario, the audit trail — required under HIPAA for access to protected health information — had identified gaps of 15–45 minutes. Any gap in audit coverage during a failover was a compliance liability.

The Solution

How We Rebuilt Their Foundation

DharmOps designed and implemented a multi-region high-availability architecture over a 6-week engagement. The entire migration was executed live — no downtime, no maintenance window. The new architecture was designed to be self-healing: no engineer needs to be paged for a datacenter failure.

  • Deployed 3-region PostgreSQL with synchronous streaming replication. Established a primary in us-east-1 with synchronous standbys in us-west-2 and eu-west-1. Synchronous replication guarantees zero data loss on failover — any write confirmed by the primary has already been committed to at least one standby.
  • Implemented Patroni for automatic leader election. Patroni monitors cluster health continuously and performs automatic leader election within 28 seconds of a primary failure — no human intervention required. DNS failover updates automatically, rerouting application traffic without any code change.
  • Set up WAL-G for continuous archiving with point-in-time recovery. Write-Ahead Log segments are shipped to S3 buckets in all three regions every 60 seconds. This gives the team point-in-time recovery to any second in the last 30 days — not just to the last nightly snapshot.
  • Implemented continuous audit log replication and quarterly DR drills. Audit logs are replicated to all three regions in real time, closing the HIPAA compliance gap. We also created automated quarterly DR drill scripts that simulate a full region failure in staging — generating a documented test report for the compliance team.

The Results

Measurable Outcomes

99.99%
Uptime SLA — Contractually Verified
<30s
Automatic Failover Time
3
Geographic Regions — Zero Data Loss

The client secured the state contract within 30 days of the architecture going live — the 99.99% SLA documentation was a key differentiator in the procurement process. The first quarterly DR drill was passed without any manual intervention. Audit log continuity now covers 100% of all access events across all three regions.

"Patient data availability is non-negotiable in our industry. Before DharmOps, we were one bad night away from a HIPAA incident. Now a full datacenter could go dark and our clinicians would never know it happened. That's the standard we needed — and they delivered it."

— CTO, Telehealth Platform (120K patients, HIPAA-covered)

Need High Availability?

Whether you're chasing an SLA, preparing for compliance, or just tired of 3am pages when a server goes down — we can design an architecture that keeps your data available automatically.