A peculiar prevalence of p values just below .05

Distributions of p were found to be similar across the different journals. Moreover, p values were much more common immediately below .05 than would be expected based on the number of p values occurring in other ranges…

The present study observed evidence of an overreliance on null-hypothesis signiﬁcance testing (NHST) in psychological research. NHST may encourage researchers to focus chieﬂy on achieving a sufﬁciently low value of p. Consistent with that view, the p value distribution from three well-respected psychology journals was disturbed such that an unusually high number of p values occurred immediately below the threshold for statistical signiﬁcance.

Tagged , ,

Twelve P-Value Misconceptions

Misconception #1: If P.05, the null hypothesis has only a 5% chance of being true.

Misconception #2: A nonsignificant difference (eg, P .05) means there is no difference between groups.

Misconception #3:  A statistically significant finding is clinically important.

Misconception #4:  Studies with P values on opposite sides of .05 are conflicting.

Misconception #5:  Studies with the same P value provide the same evidence against the null hypothesis.

Misconception #6:  P  .05 means that we have observed data that would occur only 5% of the time under the null hypothesis.

Misconception #7:  P  .05 and P <.05 mean the same thing.

Misconception #8:  P values are properly written as inequalities (eg, “P <.02” when P  .015)

Misconception #9:  P  .05 means that if you reject the null hypothesis, the probability of a type I error is only 5%.

Misconception #10:  With a P  .05 threshold for significance, the chance of a type I error will be 5%.

Misconception #11:  You should use a one-sided P value when you don’t care about a result in one direction, or a difference in that direction is impossible.

Misconception #12:  A scientific conclusion or treatment policy should be based on whether or not the P value is significant.

Tagged ,

The roots of the 5% level of statistical significance

An observation is judged significant, if it would rarely have been produced, in the absences of a real cause of the kind we are seeking. It is a common practice to judge a result significant, if it is of such a magnitude that it would have been produced by chance not more frequently than once in twenty trials. This is an arbitrary, but convenient, level of significance for the practical investigator, but it does not mean that he allows himself to be deceived once in every twenty experiments. The test of significance only tells him what to ignore, namely all experiments in which significant results are not obtained. He should only claim that a phenomenon is experimentally demonstrable when he knows how to design an experiment so that it will rarely fail to give a significant result. Consequently, isolated significant results which he does not know how to reproduce are left in suspense pending further investigation.

Fisher, R. A. “The Statistical Method in Psychical Research.” Proceedings of the Society for Psychical Research 39 (1929): 185-92.

Tagged , , ,

Why Most Published Medical Research Findings Are False?

Most medical researchers blindly adhere to the popular dogma of p-values. According to this dogma, the strategy of declaring statistical significance is based on a p-value alone (often a p-value below 0.05). To the practician of this religion, Statistics refer solely to the investigation of such values. However, the probability that an association is true given a statistically significant finding, depends not only on the estimated p-value but also on the prior probability of it being real, the research bias (the combination of various design, data, analysis, and presentation factors that tend to produce research findings when they should not be produced) and the statistical power of the test. More specifically, it can been seen that the positive predictive value (PPV) of a test (i.e. the post-study probability that the association is true) equals*:

$PPV(\alpha, \beta, R, u)= \frac{(1-\beta)R + u \beta R}{R- \beta R + \alpha + u - u \alpha + u \beta R}$

where R is the ratio of the number of “true relationships” to “no relationships” among those tested in the field, α is the Type I error rate, β is the Type II error rate (and hence 1-β is the “power” of the test) and u the research bias. Hence, according to the equation above (assuming at this point insignificant bias)  a research finding is more probable to be true than false iff  (1 – β)R > α.

The graphs below highlight the relationship between the variables. As we can easily observe (click graphs to zoom in) the higher the R and the lower the type II error the higher the PPV. The red surface corresponds to the zero research bias case while the green and the yellow correspondingly to u=0.2 and u=0.6. The ball blue plane corresponds to PPV 0.5 i.e. the cut-off positive predictive value. The multicoloured floor of the graph indicates the levels of β and R (for u=0, 0.2, 0.6) for which research findings are more possible than not.