I recently discovered a programming error I had made that wrecked something I had been working on for months. Not fun. But it helped me see the ethical issues underlying data and data analysis in a new light.
There is great interest in the ethical concerns surrounding big data, that is, large scale data sets collected from ongoing business or medical operations. These discussions focus on the risks and social consequences of big data, including security, privacy, equality, and access, and the possible uses of large data sets for social control. These are critical discussions.
But those are not the issues I want to discuss. As odd as this sounds, I want to describe a change in how I relate, ethically and emotionally, to data. Here’s the distinction. I’m not discussing what are the ethically right ways to handle data sets. Rather, the question is what motivates one to handle data in the right way.
Health services research studies the functioning of medical systems and is, therefore, a classical big data discipline. I have been working for some time on a data set involving many data points on each of thousands of physicians and hundreds of thousands of patients. I recently discovered that I had made an error in a program that assembled these data, such that a substantial proportion of missing data were generated for a critical variable in the analyses. This error likely had a domino effect on the calculations downstream of it, making them inaccurate. Months of work will have to be redone (although, having traversed the path once, it will not take months to redo them).
What’s most distressing to me as a professional is not that I made a programming error. The first point of wisdom in programming is that you are going to make a lot of errors. The second point of wisdom is that because you are inescapably fallible, one of your core practices must be to build traps to catch those errors. I’m distressed that I did not construct the proper tests to catch my error. I built some, but not the right ones. The practical lesson is that when the data come with massive amounts of missing information, it’s quite difficult to detect when you have created additional missing data.
Our motivations to be moral have an emotional dimension. Here are the emotional states I experienced when I discovered the error. First, there was a double take moment when I noticed that the data were missing. Then there was an “oh shit” moment, when I measured the scope of the problem and knew that I would have to tell my collaborators. Then there was a feeling of relief, when I realized that it could have been far worse. We could easily have gone to print with erroneous results. I felt lucky that I just had to write my co-authors and did not have to write a retraction letter to a journal.
Then it struck me that I shouldn’t have been thinking about me. I don’t mean to suggest that professional reputations, or the standards they are based on, are unimportant. But concern with reputation is not the best motivation for ethical behaviour. After all, I could have protected my reputation very well by ignoring the error. The chance that someone else would have found it would have been vanishingly small.
If this is not about me, though, what’s it about? I think I was relating to the data in the wrong way. Talk of ‘relating’ to data sounds odd, because on one view data are technical artifacts, inert, immaterial, just points in an abstract space.
But on another view, health services data are the residue of the touches of living persons against the health care system. As such, they reflect the experience of those patients, even if such effects are often obscure to the analyst. The data are lit from from within by the experience of patients, even if only faintly. Medical data are the relics of human suffering, recovery, and death. We wouldn’t be looking at them if there wasn’t a signal there.
Not seeing that light and the connection to human experience was where I fell short; because if you see the humanity beneath the numbers it commands serious, curatorial attention. We should treat patient data like ancient scrolls recovered from jars in a cave.
The Jesuits talk about “finding God in all things.” The lesson I take from this experience is that I need to pierce the veil surrounding data and ask, when I handle it, whether I am serving the patients who generated it, the families to whom they are connected, and the invisible others who share their afflictions.