• Methods: P values

    JAMA is running a guide to statistics and methods series. I come across methods tutorials in other journals from time to time as well. I think I’ll start excerpting and pointing readers to them.

    Let’s start with P values, as discussed recently in The BMJ. The setting is an examination of birth weight of infants whose mothers had been randomized to receipt of a certain diet (low glycemic index, but that doesn’t matter) or not.

    The P value for the statistical test of birth weight was P=0.449. The P value represents the proportion of the theoretical infinite number of samples—that is, 0.449 [44.9%]—that have a mean difference in birth weight equal to, or greater than, that observed in the trial above. This is irrespective of whether the mean birth weight was higher or lower for the intervention group than for the control group. More formally, the P value is the probability of obtaining the observed difference between treatment groups in mean birth weight (or a larger one), irrespective of the direction, if there was no difference between treatment groups in mean birth weight in the population, as specified by the null hypothesis. The P value for the statistical test of the primary outcome of birth weight was P=0.449, which was larger than the critical level of significance (0.05). Hence there was no evidence to reject the null hypothesis in favour of the alternative. The inference is that there was no evidence that the intervention and control treatments differed in mean birth weight in the population.

    Usually when I read a P value (or any statistic), I try to get my mind to interpret it according to the definition. I try not to let my mind wander into other (false) characterizations. For instance, I would read P=0.449 as, “Assuming the null hypothesis to be true (i.e., no effect of the diet), the probability of obtaining at least the observed weight difference between diet and control groups is 0.449.” (Secret: Because of the wishy-washy language in most papers, I’m often confused by the reporting of hypothesis tests and the only reliable way I’ve found to understand them is to go back to the definition.)

    I could write a lot more here, and I nearly did. But I’m going to try to keep these methods posts very short and focused. So, I’ll stop. Read The BMJ paper for more on P values, though you can find information elsewhere I’m sure. Also, I’m opening up comments to discuss P values and to solicit pointers to other good, simple methods papers. Feel free to provide additional resources. (Comments automatically close one week from the post’s time stamp.)

    @afrakt

    Share
    Comments closed
     
    • Type II errors (falsely rejecting a false null hypothesis; someone should ban triple negatives) are all too common in “evidence” based medicine. “The study did not achieve P < 0.05, therefore the treatment does not work…" If the P value was 0.1, there is in fact a 90% chance that there is a real difference with treatment, but the study was under powered. The problem is made worse by the fact that insurers have a near term direct interest in rejecting and not paying for new treatments even though they may very well be beneficial.

    • This statement is not true: “The P value represents the proportion of the theoretical infinite number of samples—that is, 0.449 [44.9%]—that have a mean difference in birth weight equal to, or greater than, that observed in the trial above.”

      I really dislike the fact that the author even included it and I appuald Austin for correctly inserting it.

      The author does go on to give the correct statement later in the paragraph but you can just see how many will fail to use the correct one. You have to specify that “Given the null hypothesis is true, the P value represents is the proportion…”.

    • I thought I understood p-values until I read the quote from the BMJ. It left me totally confused.
      Wikipedia has a clear description:
      In statistical significance testing, the p-value is the probability of obtaining a test statistic result at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.