The Importance of Structured Data and Context in Healthcare
Healthcare information captured by EHRs and other systems is of greatest value if is “structured.” Structured data refers to having information that is contained within a specific field (e.g., Last Name) or data that can be assigned a code. “Unstructured data” refers generally to information stored as free text in applications. Codified data allows computer applications to process information with a much higher degree of accuracy. At this point in healthcare IT systems have the ability to capture some codified data but the value of this data is still compromised by a number of factors:
- Claims data (e.g., ICD, CPT, HCPCS) is the primary type of codified data in healthcare systems. Claims data is designed for the classification of information for the purpose of billing, and it is not safe for use in clinical care or research. It provides for an “approximation” of what actually occurred during patient care.
- Even when a more clinically specific terminology (e.g., SNOMED CT) is used to codify data a number of challenges remain. Healthcare information is not made of single concepts in isolation. Some current systems have the potential to assign a SNOMED CT code to some primary concepts in the record. However, additional information is needed to fully understand the meaning of the information that has been captured by the health IT system.
- Where it occurs in the record (e.g., problem list vs. past medical history)
- Temporal context (e.g., when did it occur or start)
- Authorship reliability (e.g., primary care provider entered, specialist entered or patient entered information)
- Completeness of the clinical expression (e.g., modifying information about a diagnosis
A provider who sees a patient for vague neurologic symptoms may state in the assessment section of the visit note: “doubt multiple sclerosis.” Current healthcare IT systems are fairly good at capturing and assigning a code to the concept “multiple sclerosis.” However, virtually no systems have the ability to capture the concept “doubt.” If the concept code for Multiple Sclerosis alone is shared with another system, and the modifier term “Doubt” is removed, another provider who relies on the health IT system for information may be misinformed. The patient actually does not carry the diagnosis of multiple sclerosis but if symptoms recur and they are truly being caused by another as of yet unidentified process (e.g., Lyme disease), the next provider to see the patient may assume that the symptoms are being caused by the patient’s multiple sclerosis.
The full context supporting clinical data (often referred to as “metadata,” i.e., the “data about the data”) is necessary for accurate clinical decision making and for clinical research.
Various code sets (e.g., SNOMED CT) have the ability to represent this type of knowledge by assigning either two codes (i.e., a code for “doubt” and a code for “multiple sclerosis.”). Putting two distinct concept codes together to form a more complex and accurate clinical concept is referred to as “Post-Coordination.” While this seems fairly straightforward, the ability to post-coordinate clinical expressions is not supported by the vast majority of healthcare systems currently in operation even within the EHR system that is capturing the data. When this information can be captured, systems often have difficulty storing it in databases that allow for accurate levels of retrieval or for use in applications such as clinical decision support tools.
Compounding matters, if a system were to be able to send two connected codes as a “code phrase” (i.e., SNOMED CT code for “Doubt” and the SNOMED CT code for “Multiple Sclerosis” to another health IT system, the receiving system would, in nearly every case, have no method for receiving and storing this information as codified data.
Solutions: Mechanisms have been developed and tested that would allow post-coordinated clinical expressions to be shared between two different healthcare IT systems. However, these have been largely limited to research settings. At this time further work is needed to fully vet standards and ideally include them as a part of mandatory healthcare IT solutions (e.g., Stage 3 Meaningful Use).
Until including the full context of codified information, including the ability to manage post-coordinated expressions, is ubiquitous in healthcare IT, providers and researches are strongly encourage to use data that is “abstracted” from clinical records as a “pointer” towards the source document, where the full context is normal preserved. Failure to do so may lead compromises in patient safety.
The information contained in this article represents the opinions of its author: Michael Stearns, MD
©Michael Stearns, all rights reserved