Big Data Challenges and the Value of Codified Terminologies

The New York Times article titled “For Big-Data Scientists, ‘Janitor Work’ is Key Hurdle to Insights” is a good read. It provides an excellent overview of the challenges associated with making sense out of massive amounts of data being generated by disparate information systems. The dramatic increase in the use of EHRs and related technologies in healthcare have fueled a data explosion that is unprecedented. An avalanche of information has been coming from thousands of HIT systems using different languages. Despite efforts going back over 30 years, the vast majority of data that is attached to any type of code is claims data (e.g., ICD-9-CM codes). This type of structured data is notoriously unreliable in clinical care and only represents a small fraction of the data needed to make a difference in healthcare research and population management.

SNOMED CT, LOINC and other reference terminologies/ontologies attempt to give structure and meaning to information at the level of the concept. The challenges faced by big data scientists would be markedly reduced if a greater amount of healthcare data was stored as codes using these types of terminologies. Progress has been slow but Stage 2 Meaningful Use does offer hope that the adoption of SNOMED CT will increase, and that EHRs and other HIT systems will begin to adopt codified terminologies as the core language of their information systems. These efforts have the potential to transform healthcare and genomic research.

The article can be found here: “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights”

Author of this post: Michael Stearns, MD

Health Information Technology Fundamentals

Big Data Janitors?

Leave a comment Cancel reply

Pages

Archives

Categories

WordPress

Subscribe