Fair Share: Can You Ever Have TOO MUCH Privacy?

By Ruby Raley, Director, Healthcare Solutions, Axway

Craig Mundie’s comments in this GigaOM piece were provocative, even audacious, and while I have no comment on his views on payers, profits and the uninsured, I do have some comments on the technical issues surrounding those views.

I can actually attest to Mundie’s assertion that there is value in sharing specific information as opposed to keeping all of it under lock and key. I’ve noticed in recent years that solutions to problems involving data exchange, aggregation, normalization and standardization in other industries can be quite usefully applied to the healthcare industry as well.

In the healthcare industry, however, not all collected data is amenable to aggregation — a key differentiator from other industries — so the dimension of the data analytics (i.e., exactly what we want to mine and aggregate) must be rigorously defined. “Dimension” is a business intelligence (BI) term that has to do with how data is organized and how users access it. Defining dimension appropriately is the key to efficiently storing (i.e. warehousing) data, as well as accessing it easily (i.e. faster search times).

And that’s my point: health data has significant issues with dimension quality, and these issues must be sorted out before we can capitalize on big data in Healthcare.

For example: Multiple entities (e.g., patient, insurer, government agency, provider) often pay portions of a single bill; no one entity owns the data. But since we have a defined structure (HIPAA 4010, soon to migrate to the even better HIPAA 5010) for claims submission across large and small healthcare providers, health plans, and agencies, we could use this structure to collect data from multiple sources and build a useful, holistic view of patient needs.

It’s important to remember that many medical conditions are complex, and may involve multiple specialists working with and reporting a subset of the patient’s problems – another point where healthcare is unlike other industries. Claims data is the most standardized structure used in healthcare and has clearly defined dimensions. Whereas HL/7 and IHE formats commonly implemented in EMR/EHR systems suffer from two fundamental barriers – lack of adoption and lack of consistent structure (because format is often modified to meet organizational needs during implementation time).

In addition to usefully defined dimensions, there must also be consistent and aligned data values, because when you have multiple identifiers and multiple code sets among multiple providers for a single patient, it’s impossible to align data sets into a common format and capitalize on precedents that could help the patient.

Barriers to aligning, refining and mining data values are significant, and unique to healthcare. No one channel master or service provider sees the entire spectrum of services related to a given patient at a micro level, or to a given medical condition at a macro level. We have yet to solve the puzzle of how to create a unique identifier for each patient and each provider, and that makes it extremely difficult to anonymize data. Diagnosis codes do not even use a common code set (or language), even though we are moving toward that goal, so currently we cannot equate a diagnosis code to a retail inventory item code and assume big data will work for healthcare as it has for retail.  This is partly because people’s health issues are very complex, and symptoms and illnesses (diagnosis) can be affected by age, gender, general health, genetics and other conditions. All of this complexity makes aggregation of diagnosis codes a real challenge, and is at the root of why many practitioners are skeptical when it comes to computer-assisted diagnosis and analysis of patients. Our first principle in healthcare is to do no harm.

Imagine, for example, if a physician wants to know which treatments would be most effective for a woman suffering from type 2 diabetes and microalbuminuria. With defined dimensions and consistent, aligned values, the knowledge gleaned from aggregating data from other anonymous cases could be accessible to the physician.

And since shared data must be transmitted, but typically won’t be represented the same way at its destination as at its origin, the data must be transformed into a common format that can be unpacked and consumed anywhere. That transformation requires standardized, structured data interchange, a standard practice in many industries already, and one that readily applies to healthcare.

So, to summarize, there are three critical issues that need to be addressed:

1. Standardization of data structure across all the data we want to share, since no one entity owns all the data.
2. Standardization and alignment of data values (semantics), including determining how individual data elements (such as patient demographics and/or multiple medical conditions) impact analysis of care.
3. Enabling data sharing by those at the front lines of healthcare, so they can use the data to improve care and reduce waste.

Clearly, the sharing of specific kinds of data has the potential to be tremendously valuable not only for an individual but for a community, a country and perhaps even the world; but it begs the question: Who owns the data? The patient? The provider? The government? This important question ultimately extends beyond the realm of technology and into the realm of politics and law (and far beyond the scope of this comment!).

Still, I believe that once these non-technological issues are sorted out, our systems will serve us better than ever before, transcending the obvious organizational, resource and cost advantages to actually empower medical practitioners to fulfill their mission: improving and saving as many lives as possible.