Migrating Legacy Databases for a Multi-Specialty Healthcare Platform
AI
Healthcare
AI consulting
Engineering
5
Specialized Airflow pipelines, each handling a different data domain
49
Additional cross-database tables with complex foreign key dependencies
32
Missing columns between source systems due to schema drift
100%
Validation pass rate, 5-level checks run after every single table
About the Client
A US-based multi-specialty healthcare platform had grown over the years by operating across three separate legacy databases on MS SQL Server — one for clinical records, one for pharmaceutical data, and one for pharmacy inventory.
As the platform scaled, maintaining these siloed systems became a critical operational and compliance liability.
Data inconsistencies between sources, schema drift, and the inability to run unified reporting across all three systems put both care quality and audit readiness at risk.
Why they reached out to us
- Three isolated MS SQL Server databases were creating data inconsistencies across clinical, pharmaceutical, and inventory workflows — there was no single source of truth
- Schema differences between sources (including up to 32 missing columns per table), integer-based legacy IDs, and cross-database foreign key dependencies made off-the-shelf migration tools inadequate
- HIPAA compliance required a tamper-proof audit trail of every data transformation, with rollback capability at every stage
- Any production downtime would directly disrupt patient care and clinical operations — a hard constraint.
As the platform scaled, maintaining these siloed systems became a critical operational and compliance liability.
Data inconsistencies between sources, schema drift, and the inability to run unified reporting across all three systems put both care quality and audit readiness at risk.
Why they reached out to us
- Three isolated MS SQL Server databases were creating data inconsistencies across clinical, pharmaceutical, and inventory workflows — there was no single source of truth
- Schema differences between sources (including up to 32 missing columns per table), integer-based legacy IDs, and cross-database foreign key dependencies made off-the-shelf migration tools inadequate
- HIPAA compliance required a tamper-proof audit trail of every data transformation, with rollback capability at every stage
- Any production downtime would directly disrupt patient care and clinical operations — a hard constraint.
3
Legacy MS SQL Server databases with incompatible schemas consolidated into one
90+
Tables with cross-database dependencies requiring orchestrated migration
32
Missing columns per table due to schema drift between source systems
Zero
Production downtime — legacy systems stayed live throughout the entire migration
100% HIPAA
Full compliance with audit trail on every single data transformation
10x
Faster data loading via PostgreSQL COPY vs. standard INSERT statements
The Problems
1
Heterogeneous Legacy Sources
Three databases with different schemas, validation rules, and up to 32 missing columns between sources made direct table-to-table migration impossible. Each source required custom transformation logic before a single row could land in the target system.
2
Cross-Database Foreign Key Dependencies
Tables from Database A referenced tables in Database B. Without a controlled migration order and a shared ID-resolution layer, loading any table out of sequence would result in orphaned records and broken relationships.
3
Integer-to-UUID Primary Key Transformation
All legacy primary keys used integer IDs. The target system required globally unique UUIDs — meaning every ID needed to be converted, every foreign key resolved to the new UUID, and every cross-table relationship rebuilt without losing a single link.
4
Zero Downtime Requirement
The platform couldn't afford any interruption to live operations. The legacy systems had to stay fully operational and serving production traffic throughout the entire migration window — ruling out the standard "stop, dump, restore" approach.
The Solutions
Apache Airflow Orchestration - 5 Specialized Pipelines
We built the entire ETL backbone on Apache Airflow, with 5 dedicated DAGs — one per data domain (clinical normalization, core clinical tables, pharmaceutical directories, prescriptions, pharmacy inventory). Each DAG visualized task dependencies in real time, auto-retried failed steps, and logged every operation for audit. The client had live access to the Airflow UI throughout the migration.
4-Stage ETL Architecture
Every table passed through the same four-stage pipeline:
- Extract: Chunked reads from MS SQL Server (10–50K rows per chunk) with cursor-based pagination and auto-reconnect on failure
- Transform: Type conversion (e.g. TINYINT → BOOLEAN), field renaming (PascalCase → camelCase), UUID generation for primary keys, and FK resolution via UUID mapping
- Load: PostgreSQL COPY command (10x faster than INSERT), with adaptive chunk sizing and automatic transaction management
- Validate: 5-level post-load checks — row count match, NULL primary key detection, zero orphaned FK records, type correctness, and legacy ID preservation
- Extract: Chunked reads from MS SQL Server (10–50K rows per chunk) with cursor-based pagination and auto-reconnect on failure
- Transform: Type conversion (e.g. TINYINT → BOOLEAN), field renaming (PascalCase → camelCase), UUID generation for primary keys, and FK resolution via UUID mapping
- Load: PostgreSQL COPY command (10x faster than INSERT), with adaptive chunk sizing and automatic transaction management
- Validate: 5-level post-load checks — row count match, NULL primary key detection, zero orphaned FK records, type correctness, and legacy ID preservation
Configuration-Driven Migration Engine
Adding a new table to the migration required zero code changes — only a config entry describing the table name, field mappings, type conversions, chunk size, and load strategy. This reduced onboarding time for each new table from hours to minutes and made the system fully reproducible across dev, test, and production environments.
Cross-Database UUID Mapping via Airflow XCom
To resolve foreign keys across three separate databases, we built a shared UUID mapping store passed between tasks using Airflow's XCom mechanism. This allowed Task B to look up UUIDs generated by Task A — without any additional database calls — enabling accurate cross-DB FK resolution even when tasks ran in separate processes.
Many-to-Many Relationship Reconstruction
When legacy data stored relationships implicitly (e.g., a location record with num_chairs: 5 rather than 5 chair records), the pipelines automatically expanded and denormalized these into the correct target schema — creating normalized relationship tables from aggregated source data.
Zero-Downtime Migration with Full Rollback
Legacy systems continued serving production traffic throughout the entire migration. Every DAG was designed to be re-runnable: truncate strategy preserved indexes for fast reruns, and the full XCom mapping state made rollback possible at any pipeline stage without data loss.
Achieved results
The results
90+ tables
migrated with full schema normalization (PascalCase → camelCase, INT → UUID)
49 additional
cross-database tables handled with full FK resolution.
Zero data loss
100% validation pass rate across all 5 check levels
10x load
performance via PostgreSQL COPY; additional 30–50% boost from truncate vs. drop/recreate strategy.
Full HIPAA-compliant
audit trail — every record preserves its legacy identifier.
Zero
production downtime - legacy systems remained live throughout.
Book a live demo to see it in action
RELEVANT CASES
Check our more healthcare stories
AI-powered athletic footwear solution for professional athletes
6
Months to MVP release
Strava & Nike
EU competitor
Growing patient acquisition and revenue in virtual OB-GYN care
462
New patients acquired
$7M funding
Raised to scale nationwide
Cutting care delivery costs by 40% with AI-driven virtual-first healthcare
96% reduction
In cardiovascular disease amputations
Contact Us
Please provide your info and inquiry details below.