There was a nice piece over at the Genetic Literacy Project I read just recently: Why so many scientific studies are flawed and poorly understood. (link). They gave a fairly intuitive example of false positives in research using coin flips. I like this because I used the specific example of flipping a coin 5 times in a row to demonstrate basic probability concepts in some of the stats classes I used to teach. Their example might make a nice extension:
"In Table 1 we present ten 61-toss sequences. The sequences were computer generated using a fair 50:50 coin. We have marked where there are runs of five or more heads one after the other. In all but three of the sequences, there is a run of at least five heads. Thus, a sequence of five heads has a probability of 0.55=0.03125 (i.e., less than 0.05) of occurring. Note that there are 57 opportunities in a sequence of 61 tosses for five consecutive heads to occur. We can conclude that although a sequence of five consecutive heads is relatively rare taken alone, it is not rare to see at least one sequence of five heads in 61 tosses of a coin."
In other words, a 5 head run in a sequence of 61 tosses (as evidence against a null hypothesis of p(head) = .5 i.e. a fair coin) is their analogy for a false positive in research. Particularly they relate this to nutrition research where it is popular to use large survey questionnaires that consist of a large number of questions:
"asking lots of questions and doing weak statistical testing is part of what is wrong with the self-reinforcing publish/grants business model. Just ask a lot of questions, get false-positives, and make a plausible story for the food causing a health effect with a p-value less than 0.05"
It is their 'hypothesis' that this approach in conjunction with a questionable practice referred to as 'HARKing' (hypothesizing after the results are known) is one reason we see so many conflicting headlines about what we should and should not eat or benefits or harms of certain foods and diets. There is some damage done in terms of peoples' trust in science as a result. They conclude:
"Curiously, editors and peer-reviewers of research articles have not recognized and ended this statistical malpractice, so it will fall to government funding agencies to cut off support for studies with flawed design, and to universities to stop rewarding the publication of bad research. We are not optimistic."
More on HARKing.....
A good article related to HARKing is a paper written by Norbert L. Kerr. By HARKing he specifically discusses it as the practice of proposing one hypothesis (or set of hypotheses) but later changing the research question *after* the data is examined. Then presenting the results *as if* the new hypothesis were the original. He does distinguish this from a more intentional exercise in scientific induction, inferring some relation or principle post hoc from a pattern of data. This is more like exploratory data analysis.
I discussed exploratory studies and issues related to multiple testing in a previous post: Econometrics, Multiple Testing, and Researcher Degrees of Freedom.
To borrow a quote from this post- "At the same time, we do not want demands of statistical purity to strait-jacket our science. The most valuable statistical analyses often arise only after an iterative process involving the data" (see, e.g., Tukey, 1980, and Box, 1997).
To say the least, careful consideration of tradeoffs should be made in the way research is conducted, and as the post discusses in more detail, the garden of forking paths involved.
I am not sure to what extent the credibility revolution has impacted nutrition studies, but the lessons apply here.
HARKing: Hypothesizing After the Results are Known
Norbert L. Kerr
Personality and Social Psychology Review
Vol 2, Issue 3, pp. 196 - 217
First Published August 1, 1998