Article

From Variation to Value

How CitiusTech uses Databricks to turn surgical data into actionable clinical insights

Priyanka Lakhani, MD, MBA
AVP Consulting, Clinical SME,
CitiusTech

Tomy Alexander
Associate Director – AI and Analytics,
CitiusTech

5-May-2026

The hidden cost of clinical variation

Every year, U.S. health systems spend billions on care that deviates from evidence based standards—not because patients are more complex, but because care delivery differs across patient demographics, physicians, facilities, and operating environments. These variations directly impact length of stay, operating room utilization, and post‑operative complications. In hysterectomy care, for example, a difference of just 23 hours in average length of stay (3.7 vs. 4.3 days) can add approximately $865 per case, creating a significant financial impact when scaled across thousands of annual procedures. The magnitude of this opportunity is further understood by another high‑volume procedure such as CABG, where addressing variation across anemia management and infection control has been shown to represent over $45M in annual savings potential per hospital. Together, these examples highlight how systematically identifying and reducing clinical variation unlocks both clinical and economic value at scale.

What is clinical variation - the difference in healthcare processes or outcomes compared to peers or evidence-based guideline recommendations. Some variation is warranted/explained: a patient with severe comorbidities will naturally require complex treatment which may result in longer stay. But a significant portion is unwarranted/unexplained, driven by inconsistent practices rather than patient need, and it is entirely preventable. Literature suggests 10-20% of CABG Cost Variation observed is unexplained.

Despite decades of investment in data platforms and analytics, most health systems still struggle to distinguish warranted variation from unwarranted variation at scale. The reasons are deeply rooted in how clinical insights have traditionally been delivered:

Manual chart review bottleneck: Identifying outlier cases today requires clinical SMEs to manually review patient records one by one. This process is expensive, slow, and cannot scale beyond a handful of procedures at a time.
Siloed data across the care journey: Meaningful variation analysis requires a 360-degree longitudinal view spanning pre-operative status, intra-operative techniques, post-operative complications, and outcomes. In practice, this data lives in separate EHR modules, claims systems, and facility databases that are rarely connected.
No systematic benchmarking: Comparing a patient's outcome to "similar patients" sounds straightforward but defining clinically and demographically comparable cohorts requires sophisticated clustering techniques that go well beyond simple averages or peer groupings.
Insight-to-action gap: Even when variation is identified, translating a statistical finding into a clinical intervention requires rapid iteration between data teams and clinical leaders. Traditional BI dashboards simply cannot support this back-and-forth at the speed care decisions demand.

These challenges play out across high-impact surgical and medical scenarios every day.

Surgical Length of Stay (LOS) remains one of the largest drivers of cost variation. Extended stays increase facility costs, reduce throughput, and can signal post-operative complications that were preventable. Yet most health systems benchmark LOS using nation-wide averages rather than facility and patient-specific cohort comparisons.

Cost per case optimization requires understanding not just what was spent, but why. Two patients with the same diagnosis may have vastly different costs based on surgical approach, antibiotic protocols, or complication rates. Without granular, factor-level analysis, cost reduction initiatives remain blunt instruments.

Quality score management under value-based care models depends on reducing unwarranted variation. Payers and CMS increasingly tie reimbursement to demonstrable adherence to evidence-based protocols like ACOG and ERAS. Health systems that cannot systematically identify and address variation risk significant financial penalties.

How CitiusTech's CVR solution leverages the Databricks Data Intelligence Platform

CitiusTech's Clinical Variation Reduction (CVR) solution was purpose-built to solve this problem. Rather than replacing clinical judgment, CVR augments it using machine learning to surface the cases, factors, and patterns that matter most, so clinical leaders can focus their expertise where it has the greatest impact.

The solution runs end-to-end on the Databricks Data Intelligence Platform, leveraging its core components at every stage of the pipeline:

Data Foundation: Delta Lake and Unity Catalog. CVR begins by ingesting clinical, claims, provider, financial, supply chain and facility data into a medallion architecture (Bronze, Silver, Gold) on Delta Lake. Unity Catalog enforces fine-grained access controls and data lineage — critical for HIPAA-compliant environments where patient-level surgical data must be governed rigorously. This foundation ensures that every downstream model and dashboard operates on a single, trusted source of truth.

ML Pipeline: Clustering and Outlier Detection. The core of CVR is a multi-stage machine learning pipeline. First, clustering algorithms (K-Means, DBSCAN) group patients into clinically meaningful cohorts based on diagnosis, severity, demographics, and surgical approach. In a recent deployment at a leading US Health system analyzing over 6,000 hysterectomy records, the pipeline identified three distinct clusters: patients with benign uterine disease and short LOS (3,025 cases), obese patients with longer stays (1,499 cases), and older patients with malignant disease undergoing laparoscopic procedures (1,222 cases). Next, regression models (Linear Regression, Random Forest, XGBoost) compute expected outcomes for each cluster, flagging patients whose actual LOS or cost significantly exceeds the prediction as outliers.

Model Lifecycle: MLFlow. Every model version, hyperparameter set, and evaluation metric is tracked in MLflow's model registry. This ensures full reproducibility and allows clinical teams to audit exactly which model produced a given outlier flag — a non-negotiable requirement for clinical governance and regulatory compliance.

Causality Analysis and SME Validation. Once outliers are identified, CVR performs factor-level causality analysis to surface the specific clinical drivers. For example, high white blood cell count combined with low antibiotic coverage may signal infection risk. For e.g., we found that 50% of patients with elevated WBCs had no antibiotic administered and developed infection as a post op complication. These findings are presented to clinical SMEs through an intuitive validation interface, creating a closed feedback loop between data science and clinical expertise. This loop is what transforms a statistical model into a clinically actionable tool.

Natural Language Querying: Databricks Genie. For clinical leaders who need to explore variation data without writing SQL, Databricks Genie provides a natural language interface directly against curated Gold-layer tables. A chief quality officer can ask, "Which surgeons have the highest outlier rates for laparoscopic hysterectomy?" and receive an immediate, data-backed answer. This democratizes access to variation insights well beyond the analytics team, putting the power of the entire pipeline into the hands of decision-makers.

Proven impact: Leading US Health System – Hysterectomy use case

The CVR solution has been in a real-world deployment at one of the largest health systems in the southeastern United States. Analyzing 6,646 outpatient hysterectomy records, the solution delivered measurable results:

Large‑scale outcome‑focused analysis:
Analysis was conducted across 6,600+ patients across 2 years, evaluating 44 clinical and operational factors against three key outcomes—length of stay, cost per case and 30-day return to ED
Meaningful patient risk stratification:
The approach identified three clinically meaningful patient clusters (Mild, Moderate & Severe) to provide a base risk score of each patient.
ML Analysis revealed that approximately 3% of cases represented true outliers, that were associated with ~700% higher cost compared to inlier populations, demonstrating a highly concentrated and addressable value opportunity.
Practical ML modelling grounded in data readiness:
ML models were developed using 12 high‑quality clinical and operational variables, selected based on data completeness and consistency, enabling reliable pattern detection while reflecting real‑world healthcare data constraints.
Actionable levers for care standardization, not just insight:
Insights converged on concrete, operational domains—including anemia management, WBC‑linked infection control adherence, and OR time standardization—equipping quality and operational teams with specific levers to reduce unwarranted variation rather than abstract risk signals.

What previously required weeks of manual chart review by clinical analysts is now automated, repeatable, and scalable to any surgical procedure.

Key highlights of CitiusTech's CVR solution

End-to-end ML pipeline on Databricks: Re-usable and deployable pipelines that cluster clinically similar patients, detect outliers using robust regression models, and surface causality factors — all orchestrated through Databricks Workflows and tracked in MLflow.
Natural language access with Databricks Genie: Clinical leaders can query variation data, benchmark cohorts, and explore outlier drivers in plain English — without depending on data engineering teams or waiting for custom reports.
Evidence-Based Medicine (EBM) aligned: The solution maps over 50 clinical factors from EPIC across the entire surgical journey (pre-operative, intra-operative, post-operative), aligning with established protocols such as ACOG and ERAS to ensure that variation analysis is grounded in clinical standards, not just statistical patterns.
80% reusable across surgical use cases: The data pipelines, clustering models, and outlier detection framework are use-case agnostic by design. Only 20% of the configuration requires domain-specific customization — making CVR rapidly deployable for new procedures, new facilities, and new health systems.

Clinical variation is not a new problem, but the tools to solve it at scale are finally here. CitiusTech's CVR solution on the Databricks Data Intelligence Platform gives health systems the ability to move from retrospective, manual analysis to proactive, ML-driven insight reducing costs, improving outcomes, and aligning care with evidence-based standards. For clinical leaders ready to turn surgical data into measurable value, the journey starts here.

To learn more about CitiusTech's Clinical Variation Reduction solution or to schedule a demo, contact CitiusTech at https://www.citiustech.com/contact-us.