1) What is a power calculation?

It’s a calculation that tells you, given sample sizes, assumed baseline risk, and treatment effect size, with what probability you can reject the hypothesis that the treatment had no effect (the “null hypothesis”). In the context of my post, the sample sizes are the numbers of people in the control group (~6000) and the number who received Medicaid (~1500). The baseline risk of elevated GH was 5.1% and the treatment effect size was a reduction in that risk of 0.93 of a percentage point.

2) What do you use a power calculation for?

One thing I use it for, as do others, is to estimate how big a sample one would need to achieve 95% probability of rejecting the null hypothesis. That’s how I used it in my post.

3) Yeah, but what does that mean?

How big the study has to be so the 95% confidence interval on the estimate doesn’t overlap zero. If it overlaps zero then one cannot, conventionally, say that the result is statistically significantly different from no effect.

4) But the results in the Medicaid study on physical health did overlap zero. So, huh?

Right. The point of my post was, how big would the study have had to have been so that didn’t happen, assuming the baseline risk and average effect size the paper reported? In a variation, I considered how big the baseline risk would have had to have been holding the sample sizes constant.

5) Oh, so you’re saying that if they had found physical health effects to be statistically significant you’d be OK with that, but since they didn’t you’re saying that the study is flawed. How is that fair?

No. The study is not flawed. It just has limitations, as do all studies. I am saying that when a result is statistically significant, it means that the sample size was adequate. There’s no need to do a power calculation in that case. When it isn’t, it’s possible the sample size was not adequate. One has to do a power calculation to check. One has to answer the question, could this study have ever found a statistically significant result for a reasonably sized effect? If the answer is “no” then using the study to show the intervention has no effect is not very persuasive.

6) Why don’t you do power calculations for other studies?

You can search the blog, but I make habit of focusing only on statistically significant findings, which means power is sufficient. I don’t, generally take the time to examine statistically insignificant ones. The issue of power does cross my mind from time to time. When I think a study is underpowered (too small a sample size) I generally don’t discuss it at all.

7) Why are you talking about it now, then?

Because this is a hugely important study and tons of other people are talking about it. Moreover, they’re talking about the physical health findings as if the study had enough sample to detect reasonably sized, clinically significant differences. Unless I made an error, it doesn’t seem to me that it did. My interest in properly interpreting the study is why I did the power calculation publicly. I shared it with the authors too.

8) So the whole study is invalid! Why did you hype it a year ago?

The study is not invalid. It doesn’t have enough sample to answer some questions, and it does have enough sample to answer others. Remember, a power calculation pertains to a specific analysis, to one measure. It depends on the baseline rate and treatment effect size. Those are different for each measure examined.

9) But you only looked at elevated GH. What about the other measures?

See my next post. Spoiler alert: The conclusion is the same. The sample was too small.

10) This is all very nice, but I’m not convinced you aren’t just trying to explain away a result you don’t like.

If I can’t convince you with math, I doubt I can with words. Do I dislike the results? I confess, I don’t feel very emotional about them. Do you?

@afrakt

item.php
• “If I can’t convince you with math, I doubt I can with words”

By not posting respectful opposing opinion you have made it impossible for anyone with a difference of opinion to find you very convincing.

• I don’t understand what you mean. In fact, I’m not aware that anyone is claiming the study was sufficiently powered in all but a vague hand-wave of something akin to, “How can a study of 12,000 people not be big enough?”

My respectful answer is, you have to do some math. I’ve done it. People think that’s weaseling out. It’s not. Show me the math.

• I don’t think the blog is open enough. That means the blog can exclude some valid alternative opinions. I was just over at the ncpa site and noted a blog on this same subject with a completely different opinion and some other opinions that seem to highlight my concerns.

I can’t be expert at everything so I listen to diverse opinions. From that diversity I form an opinion recognizing that my opinion without expertise will never be complete. Open blogs though they have some distractions are open to all opinions not fearful of criticism. Closed blogs make one suspicious with regard to what has not been posted. Open vs closed changes the value of the blog.

• What opinion are we excluding here? This is an evidence-based blog, by the way. So, we explicitly avoid non-evidence based claims.

• You recognize that when things are censored it means they are not published for review by others. I have had posts that have not been posted. That may have been due to technical problems rather than censorship, but I have no way of knowing and prefer to attribute the lack of posting to technical problems. However, from reading other comments along with other blogs one gets the idea that the censor is determining what is or is not evidence-based.

I have noted at times, without commenting, factual mistakes being made by the blogger and not even recognized as a mistake when pointed out by another. I have noted some discussions that ended abruptly with the blogger having the last remark even though the remark in my belief was wrong.

I am interested in diverse opinions, but when the comments are censured for content then much of the diversity disappears and doubt sets in.

• Sorry, I do not recall ever censoring your posts. Exceedingly few are. If they followed the comment policy, there should be no problem.

Keep in mind, comment threads on most other blogs are horrible. Most thoughtful people hate them. We aim higher.

• You are not the only blogger on the list that I have posted to though I have seen incidents that appear as if some others might have been censored. Others have reported similar experiences, but once again technology can be the culprit and known censorship the villain. Remember it is a short hop from “we explicitly avoid non-evidence based claims” to censorship of ideas one doesn’t agree with calling those ideas non-evidence based especially when the censor is the blogger.

I recognize how annoying some comments can be, but I am sure they are very few in number and your readers have the ability to separate such comments from the rest. The alternative is not knowing and not trusting. Which is worse? To one that is non discerning convenience might be preferable, but for those that are truly interested in ideas they will learn to live with the inconvenience.

• OK, here’s where I draw the line. Your comments are noted. But this blog is run for and by us. Don’t like what we do, no need to read or comment. Our house, our rules. That’s just the way it is. And, I accept whatever perception that creates.

Just to explain: the purpose is not to freeze out variety of opinion. The purpose is to retain some usefulness in the comments to us, the authors. We like it when comments are informational and move toward light, not heat. If you want another experience, you’ll find it elsewhere.

• I didn’t mean to press your buttons since the house rules are abundantly clear along with the perception those rules create. That of course is your choice not anyone else’s. Unfortunately that pushes a group that has informational comments away and creates less than a complete mindset. We are both aware of that so nothing else need be said.

All this became dramatically clear recently as the Oregon Medicaid study was hotly debated on multiple sites such as yours, the Atlantic, the NYimes, NCPA, Mother Jones, Forbes, etc. A lot of accusations were flying in all different directions with accusations of what is or is not evidenced-based. Everyone seems to have at least one legitimate point so I decided to see what the commenters had to say. Some of the commenters on some of these blogs have incredible credentials and many others work in a specific portion of the healthcare field so they have specific knowledge that might be totally unknown to most others including the experts. That is what prompted these replies.

Enough, you have been kind in making your points and I have made mine.

• Interestingly, I am aware of no analyst or commenter who actually went to the trouble of examining the power of the study. This is a fairly straightforward calculation, which I’ve done no this blog. It ought to settle matters as to what the study was capable of showing.

We have blocked no comments pertaining to this study (that I’m aware of anyway). The comments are open. I’m delighted that we get so few that violate the policy.

• That is a pity and something that would have made an interesting and enlightening discussion. Chris Conover did discuss a few of the numeric features comparing them favorably to the Rand Study. Some of the other articles I read made some type of numeric reference as well, but detail and discussion on this particular subject was lacking.

The problem I see is that we focus on many studies that frequently use disputed data from sources that had used the same disputed data earlier. Thus in order to actually check the validity of a study one has to go back generations of studies to find out how that data was originally created. It is not pretty. Add to that all the editorialization of those studies that are also quoted as gospel and our brains fry.

Another problem I note is that many of the researchers assume they are familiar with all the variables and thus assume the conclusion to be based upon the variable under study when other unknown variables are actually the cause or part of the cause. This is seen in the health care literature all the time.

With specific regard to the Oregon Medicaid story I think it is important because we have politicized our thinking and must believe certain things to be true to be faithful to our beliefs. Empirically one would believe that insurance would lead to reduced mortality, but the study showed how insurance didn’t do that. That has created an argument where everyone used their favorite techniques to prove themselves correct.

Assume the study was true and valid. What does that really mean? It means that insurance does not equal medical care. Now many people will jump down one’s throat for such a statement because it appears incredible. Look at all those people in India, they say, dying because of lack of insurance that permits them the treatment they cannot otherwise afford.

We are not India. We have to look deeper and see what happens to those that have no insurance. Are they treated? Yes! EMTALIA and a few other laws. Yes! Charity. Yes! Cash. That is explanation enough to demonstrate why Medicaid may not affect mortality rates. Then again Let us continue to assume the study was perfect and your numbers came out twice as good as you were looking for. Maybe the duration of the study wasn’t long enough and that with prolonged lack of Medicaid the death rate would increase in a logarithmic fashion. We saw that in Ware’s study on HMO’s some decades ago. In two years he reported no difference between FFS and HMO care, but at 4 years a distinct and important difference was noted.

I personally think in general our concept of insurance is all wrong confusing the public which creates desires politicians try to satisfy. However, because the concept is so poorly understood, most of the legislation that is created causes more unintended consequences than benefit.