This week, the statistics blog 538 reposted an article titled “Not Even Scientists Can Easily Explain P-Values” on Facebook. The author suggests that, even though many scientists are able to give the mathematical definition of a p-value, many are unable to explain it to nonscientists in a meaningful way. Commenters on the post were quick to suggest that, really, this isn’t a problem for scientists at all. For example, one writes:
The problem seems to be the demand that p-values be “intuitive” to lay people. They’re not going to be. We don’t go to graduate school for years to learn concepts that are intuitive to lay people. Our intuitions develop far beyond the point where they were at the beginning!
This commenter seems to be suggesting something that several others have echoed: in fact, scientists do know how to define and interpret p-values; the real issues are that (some) scientists aren’t able to explain difficult concepts to nonscientists, and have no obligation to give such an explanation.
If scientists use p-values in their work, do they (1) have an obligation to know their definition? (2) An obligation to know how to give a proper interpretation? (3) An obligation to give a proper explanation to the public? (4) An intuitive explanation? Although I have views on questions (3) and (4), I want to focus on (1) and (2) in this post.
I think we’d all agree that the answer to (1) is ‘yes’, and the 538 suggests that most (or all) scientists asked could give the correct mathematical definition.
A p-value is the probability of obtaining a statistical summary of the data (e.g., the sample mean) that is at least as extreme as the one actually computed, assuming that the null hypothesis is true.
As another 538 post points out that this definition is about as “clear as mud”. Several issues need to be expanded upon, including what “at least as extreme” means, what a null hypothesis is, how we calculate probabilities under the assumption that the null hypothesis is true, and which statistical summaries of data are best in different contexts. All of these expansions are possible given some basic knowledge of probability theory and inferential statistics methodology. But these issues don’t really give a satisfying answer to the question “what does p-value tell me about my statistical and research hypotheses?”
To answer this question, we need to know more probability theory and statistical methodology, but we also need to understand how to think critically and reason carefully. We need to interpret the mathematical definition above in a way that makes no unjustified logical leaps. There is much evidence that doing so is difficult. Misinterpretations of p-values are so common that the American Statistical Association (ASA) has released a statement on their misuse and Wikipedia has a page dedicated to misunderstandings of p-values. Several professional and academic organizations misinterpret p-values in official documents (538 points here, here, here, and here; some extras here, and here).
These links suggest that there are several common misinterpretations. Here are some common ones (note, these are all wrong!):
- A p-value gives the probability that the null hypothesis is true.
- A p-value is the probability that the data were produced by chance alone.
- A low p-value (a “statistically significant result”) means that the finding is practically important.
- A p-value provides information about replicability (e.g., that one minus the p-value is the reliability of the result).
Given that p-values are widely used in scientific studies–one researcher estimated that p-values have been used in at least three million scientific papers–the fact that academics, scientists, students, etc., have trouble interpreting them is worrisome. It also leads to bad consequences. For example, Stephen T. Ziliak and Deirdre N. McCloskey argue that misinterpretations of p-values leads to a heavy reliance on statistical testing, which can lead to very undesirable consequences.
It seems clear that, contrary to what we may think, many scientists don’t know how to interpret p-values. How might we fix this issue? An important component here seems to be that most of our statistics education focuses on technical aspects, such as important mathematical results and learning how to program. Of course these topics are important, but what makes statistics such a beautiful area of study is that, at once, it is technical, highly theoretical, highly practical, and philosophical. In an age driven by STEM education, it is important that we not overemphasize the technical by not acknowledging the philosophical. As Heidegger would say, thinking exactly might be fashionable, but thinking rigorously is better.* Statistical analyses can be nuanced. Philosophy teaches essential critical thinking and logic skills that motivate us to ask the right questions and seek creative solutions. Philosophy, and especially critical assessments of scientific methodology, such as those studied in a basic philosophy of science course, are wonderful supplements to the standard statistics curriculum.
“Exact thinking is never the most rigorous thinking, if rigor receives its essence otherwise from the mode of strenuousness with which knowledge always maintains the relation to what is essential in what is. Exact thinking ties itself down solely in calculation with what is and serves this exclusively.”–Martin Heidegger