Big data, real-time analytics and machine learning have high growth potential in healthcare. Real-time analytics help derive actionable insights that can improve care by identifying gaps in care, quality issues, risks and operational issues. Machine learning is already demonstrating value by combing through massive datasets of individual patient health records, genomic data, data from wearable health monitors, online reviews of physicians, medical imagery and efficiently predicting patient outcomes. Real-time analytics and machine learning require large volumes of data to be parsed and processed in real-time, and this processing requires better tools. One such open source tool which we prefer at CitiusTech is Apache NiFi. The big data team at CitiusTech, while working with the tool, discovered that a parser for C-CDA files would make for a strong addition to NiFi parsers, and also make adopting NiFi much more seamless for healthcare developers. This blog post aims to share some insights about the parser.
Healthcare is generating and sharing more data than ever
Hospitals are sharing large amounts of healthcare data every day. According to the American Health Association, 75 percent of hospitals employ at least a basic EHR, but only 40 percent of the hospitals can use the data they receive. There is also a surge in HIEs (Health Information Exchanges), which has enabled hospitals to overcome their interoperability barriers. This shows that increasing number of providers are adopting electronic health record systems.
What is the C-CDA?
The HL7 Consolidated CDA (C-CDA) is a standard for exchange of clinical documents and an essential part of meaningful use certification requirements for electronic health record (EHR) vendors. C-CDA conforms to the HL7 V3 implementation technology specification (ITS), is based on the HL7 reference information model (RIM), and uses HL7 V3 data types. C-CDA is widely used for an exchange of healthcare data. It uses XML, making it interpretable by humans and machines. C-CDA XML document starts with a header section containing patient, author and custodian information. The header is followed by sections for encounter, vital signs, problems, allergies, medications and procedures. C-CDA supports nine document types, for example, Continuity of Care Document (CCD), consultation note, history and physical (H&P) note, etc.
What is NiFi?
NiFi is a 100% open source platform that makes streaming analytics faster and easier. It enables accelerated data collection, curation, analysis and delivery in real-time, on-premise or in the cloud, through an integrated solution with Apache NiFi, Kafka and Storm. Apache NiFi is a dataflow system based on the concepts of flow-based programming. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. NiFi has a web-based user interface for design, control, feedback, and monitoring of data flows. It is highly configurable along several dimensions of quality of services, such as loss-tolerant versus guaranteed delivery, low latency versus high throughput, and priority-based queuing. NiFi provides fine-grained data provenance for all data received, forked, modified, sent, and ultimately dropped upon reaching its configured end-state.
C-CDA parser for NiFi
Based on our experience working with NiFi and C-CDA, we decided to contribute the C-CDA parser as open source processor to the Apache NiFi project. The C-CDA processor is a NiFi processor that can be easily plugged into NiFi processing pipeline. The processor accepts a C-CDA document and produces a flattened structure, for example:
The processor is based on C-CDA relevant schema definitions. Developers do not have to read, parse and process C-CDA as an XML document. Instead, they can plug in the processor in the data processing pipeline as depicted in the image below. Real-time analytics and processing can be performed using other processors such as UpdateAttribute to process rules using NiFi expression language. For instance, rules can be configured to identify a chronic condition based on a lab result or patient’s vital signs. Based on the rules outcome, NiFi PutEmail processor can send messages in real-time to providers and facilities.
The processor is available in NiFi release 1.2.0 and has been contributed by CitiusTech Big Data Practice at https://github.com/apache/nifi/tree/master/nifi-nar-bundles/nifi-ccda-bundle
The processor is configuration-driven and developers can add mappings for sections such as:
Apache NiFi contributors can contribute sections to the processor by submitting patches to the file:
These mappings are defined as per the HL7 implementation guide for CDA® Release 2
Considering the increasing use of EHRs and high-volume data exchange using standards like C-CDA, this contribution enables NiFi users to easily parse huge C-CDA documents into flattened key-value pairs without getting involved with the complicated hierarchical structure of C-CDA. Having C-CDA data as simple key-value pairs in conjunction with other NiFi processors helps support interesting use cases in healthcare like real-time analytics, which enables an easier transition from volume- to value-based care.
About the CitiusTech Big Data Practice
The CitiusTech Big Data Practice helps healthcare organizations develop and execute their big data strategy and manage and analyze ever-growing volumes, velocity and variety of data from disparate sources. The Big Data Practice leverages H-Scale and other Hadoop ecosystem technologies to effectively store, process and query large healthcare datasets, and support big data use cases e.g. real-time clinical alerting. CitiusTech Big Data Practice also enables organizations to build custom big data solutions that can be integrated with existing applications, helping them accelerate their big data initiatives. The CitiusTech Big Data Practice capabilities include:
- Multi-skilled practice team includes Hadoop and Spark developers, Infrastructure and data engineers and architects, Hadoop administrators, etc.
- Valuable out-of-the-box accelerators e.g., data quality measurement, data virtualization
- Expertise with Hadoop distributions from Hortonworks, Cloudera, IBM, Microsoft and NoSQL databases
- Experience leveraging the cloud (Amazon Web Services, Microsoft Azure) for healthcare big data