On Piketty and spreadsheets

The following appeared on The Upshot (copyright 2014, The New York Times Company).

Like Carmen Reinhart and Kenneth Rogoff before him, Thomas Piketty has had questions raised about his analysis; in his case, his work on wealth inequality. Though I can’t knowledgeably comment on the questions or the analysis, I can comment on the technology that Ms. Reinhardt, Mr. Rogoff and Mr. Piketty chose to do their work: the spreadsheet. This choice can increase the chances of error in complex analysis, but it also can make finding errors by nonexperts easier.

Roughly speaking, one can think of economics analysis as taking one of two forms: It’s either descriptive or multivariate. Descriptive work is simple, which is not a criticism, because it can also be correct and powerful. It’s basically what can be easily illustrated and understood with a chart. Multivariate work can be very complex, though no less powerful to those who can understand it. As the name suggests, it’s an analysis that involves many variables simultaneously. And just because it’s complex, that doesn’t make it right. But there is a right way to do it, and it’s not with a spreadsheet.

The process of going from original data to the conclusions of a multivariate analysis is not easily conveyed graphically. It is, instead, essentiallyalgorithmic. That is, conclusions are reached by starting with data and then applying a sequence of steps to arrive at answers to questions of interest. These steps can be and should be written down clearly and unambiguously and, for a computer to follow them, they must be. If this sounds like computer programming, it is. Modern, applied social science relies heavily on programming. It should.

But it can’t with a spreadsheet (like Excel), because a spreadsheet isn’t primarily designed to be used that way. Its strength is that it makes visualization and manipulation of numbers easy to do with little training. It’s sort of a glorified standard calculator — the kind you undoubtedly have at home and use to balance your checkbook. This is also its weakness, because its simplicity has a cost: spreadsheets hide the details. They don’t make the sequence of steps in any analysis as transparent as they could be. They’re there, but they’re not front-and-center. This makes discerning what they are difficult and invites error.

Try this puzzle: With a standard calculator, I started with the number 6, did some analysis to answer a specific question, and ended up with the number 28 as my result. What sequence of steps did I take to get there? If you think you know what they are, you’re almost certainly wrong. There are an infinity of ways, and a standard calculator doesn’t reveal which one I used. To be sure, the steps exist. But they’re in my head, and you’d have to do more work (like interview me) to discover them. I’m also likely to forget them. This might seem unimportant, because I have the answer: 28. But how do you or I know it is the correct one? The best way to convince ourselves of that is to look at the sequence of steps and check that they make sense. But we can’t do that easily. They’re hidden from view.

A spreadsheet is only slightly better than this at revealing the process of analysis. You can make it out, but barely. You have to really work at it. That not only makes it hard for others to assess what one does to data, it makes it hard for even the creator of that spreadsheet to keep track of what he or she has done and to see and fix errors.

For complex analysis, what social scientists usually do instead is write analysis steps in a statistical programming language, of which there are many. Such a program is like a recipe, one anyone familiar with the language can read. It says precisely how you go from raw ingredients (the data) to final product (the answer). Moreover, one can annotate such programs with plain-language descriptions of steps, making them even easier to understand and to find and fix errors. Analysis written out this way makes plain what has been done and why. Errors are far easier to find and fix than they would be in a spreadsheet.

But Mr. Piketty’s work is not complex and multivariate. It’s fairly simple. And for that, a spreadsheet is a reasonable choice. Moreover, because advanced training is not required to examine a spreadsheet, by working in one, and sharing it, Mr. Piketty made it possible for more people to check his work. That’s praiseworthy.

If the allegations hold up, Mr. Piketty may have made some errors in his spreadsheet. But the choice of that tool is not to blame for them. Were his work more complex, he’d likely have been better off using a statistical programming language. But it isn’t, and a spreadsheet is just fine.

@afrakt

Hidden information below

Subscribe

Email Address*