Answering some questions about power calculations:
1) What is a power calculation?
It’s a calculation that tells you, given sample sizes, assumed baseline risk, and treatment effect size, with what probability you can reject the hypothesis that the treatment had no effect (the “null hypothesis”). In the context of my post, the sample sizes are the numbers of people in the control group (~6000) and the number who received Medicaid (~1500). The baseline risk of elevated GH was 5.1% and the treatment effect size was a reduction in that risk of 0.93 of a percentage point.
2) What do you use a power calculation for?
One thing I use it for, as do others, is to estimate how big a sample one would need to achieve 95% probability of rejecting the null hypothesis. That’s how I used it in my post.
3) Yeah, but what does that mean?
How big the study has to be so the 95% confidence interval on the estimate doesn’t overlap zero. If it overlaps zero then one cannot, conventionally, say that the result is statistically significantly different from no effect.
4) But the results in the Medicaid study on physical health did overlap zero. So, huh?
Right. The point of my post was, how big would the study have had to have been so that didn’t happen, assuming the baseline risk and average effect size the paper reported? In a variation, I considered how big the baseline risk would have had to have been holding the sample sizes constant.
5) Oh, so you’re saying that if they had found physical health effects to be statistically significant you’d be OK with that, but since they didn’t you’re saying that the study is flawed. How is that fair?
No. The study is not flawed. It just has limitations, as do all studies. I am saying that when a result is statistically significant, it means that the sample size was adequate. There’s no need to do a power calculation in that case. When it isn’t, it’s possible the sample size was not adequate. One has to do a power calculation to check. One has to answer the question, could this study have ever found a statistically significant result for a reasonably sized effect? If the answer is “no” then using the study to show the intervention has no effect is not very persuasive.
6) Why don’t you do power calculations for other studies?
You can search the blog, but I make habit of focusing only on statistically significant findings, which means power is sufficient. I don’t, generally take the time to examine statistically insignificant ones. The issue of power does cross my mind from time to time. When I think a study is underpowered (too small a sample size) I generally don’t discuss it at all.
7) Why are you talking about it now, then?
Because this is a hugely important study and tons of other people are talking about it. Moreover, they’re talking about the physical health findings as if the study had enough sample to detect reasonably sized, clinically significant differences. Unless I made an error, it doesn’t seem to me that it did. My interest in properly interpreting the study is why I did the power calculation publicly. I shared it with the authors too.
8) So the whole study is invalid! Why did you hype it a year ago?
The study is not invalid. It doesn’t have enough sample to answer some questions, and it does have enough sample to answer others. Remember, a power calculation pertains to a specific analysis, to one measure. It depends on the baseline rate and treatment effect size. Those are different for each measure examined.
9) But you only looked at elevated GH. What about the other measures?
See my next post. Spoiler alert: The conclusion is the same. The sample was too small.
10) This is all very nice, but I’m not convinced you aren’t just trying to explain away a result you don’t like.
If I can’t convince you with math, I doubt I can with words. Do I dislike the results? I confess, I don’t feel very emotional about them. Do you?