Read. Learn. Grow.


The Fundamental Law of Databases: Garbage In = Garbage Out

Posted by Matthew Bruns on Mar 14, 2016 11:14:44 AM

In Data Management, CMMS, biomedical engineering, clinical engineering


The following is both an insightful and entertaining client email to Mainspring from Matthew Bruns, Business Systems Analyst for Hartford Healthcare Corporation, in response to asking him for a high-level overview of his expectations for accessing a library database of medical device manufacturers and models.

Like any card-carrying geek, I abhor anything that resembles actual work. So our conversation today is about an especially onerous task that I am stuck with, and the best way to find someone else to do it for me.

The fundamental law of databases is, of course: garbage in = garbage out. There is little point in spending time and money creating detailed records in sexy software that are so poorly normalized that they can’t be analyzed, or even trusted. In small hospitals, this is often a small problem. They may only have one model EKG machine in service, so it is less likely they will end up with multiple, varying descriptions of it in their CMMS. Which may actually be a spreadsheet. Or a notebook. Or what Charlie remembers.

However, the problem grows dramatically, the larger (and older) a hospital system is. When a new device comes in the door, it becomes very difficult to create a reliable asset record, if there's confusion about who made this thing, exactly what they named it, and where it fits into existing models and device categories. As is often the case, the fastest solution is the worst: Give everyone permissions to create new MFG/Model/DevCat values. In my case, the slowest solution is pretty bad too. In a desperate effort to control the chaos, I have restricted rights to create new reference data to a very limited group. And they are all sitting in this chair.

But we continue to grow, adopt new technologies, and deal with sometimes capricious regulators. These data have to be trustworthy, and I’m failing at that. The rigor required to keep a clean dataset for multiple hospitals now consumes far more time than I have. So I need a library of pre-normalized, continuously updated MFG/Model/DevCat values that my staff can select from. For years, we’ve had promises from ECRI they would offer one. The FDA has begun work on something similar, and while I think it will ultimately bear fruit, I need a solution three years ago. (By the way, I would suggest that any library you create allow for some sort of future cross-referencing with both ECRI and the FDA.)

I am the last one to say any of this is easy. Even if Mainspring did have such a library available today, to accurately link your records with the rubbish already in existing databases requires a lot of work by very knowledgeable people. That sort of thing tends to get expensive, but once you build a master library, maintaining a given customer’s data integrity going forward shouldn’t be too hard once it is fixed. 

Getting Clean

I was assuming Mainspring would do the initial nasty and tedious data normalization process in-house, and then charge their clients truly impressive sums of money for it — which the client will likely pay. However as a client, I would need to be convinced the research and cross-matching of my data was being done by experienced people who, understand medical technology and who’s work was being validated somehow. (If you plan to get into this, you will only get one shot at it, so offshoring and interns won’t cut it.)

One option might be to “crowd-source” normalization projects among existing Mainspringers, and let them work on anonymized datasets in exchange for dollars, or donuts or something. Carefully select geeks like me (or even better CE’s and BMET’s), to do the cross-referencing via an interface from home (not from work.) Pay them, I dunno  $10 per correction, and $50 per corrected correction, so they would be motivated to spot check each other’s work. Sort of a Wikipedia with some greed tossed in. (While you’re at it, build the interface generic enough to suit other datasets from other industries. It's a poor solution that only solves one problem.)

Staying Clean

Perhaps offer a subscription to your master library of reference values as either an enabled “Not set up” list, or add some sort of specialized look-up form based on licensure, à la the ECRI interface. That approach is probably better, since you would need to protect your library from bulk extraction somehow, or else your competitors will exploit it. (Or look into licensing the data using encryption?) Another option might be to move the actual adding of the correct MFG/Model/DevCat attributes of new assets upstream to Mainspring staff, and let them do it asynchronously "behind the curtain". (Or crowd-source that action to your Mainspringers, too?) Customers could use placeholder values in the meantime, such as the way I do it with my Temp Model approach. You’d have to guarantee fast turnaround, though.

Anyway. You were foolish enough to ask, and I was undisciplined enough to waste an hour answering. If something above was useful, I assure you it was an accident.


Learn about going from Good to Great with Mainspring