Joke & Meme

How a Cup of Tea Laid the Foundations for Modern Statistical Analysis

Fisher did not take the criticisms of Neyman and Pearson well. In response, he described his “children’s” and “academicly” methods. In particular, Fisher did not agree with the idea of ​​deciding between two hypotheses, instead of calculating the “importance” of available evidence, as he had proposed. While a decision is definitive, their meaning evidence only gave a provisional opinion, which could be reviewed later. Even so, Fisher’s attraction for an open scientific mind was a bit undercurred by his insistence that researchers should use a 5 percent limit for a “significant” value, and his statement that “would completely ignore all the results that did not reach this level.”

The acrimony would give way to decades of ambiguity, since the textbooks gradually confused Fisher’s null hypothesis tests with the focus based on Neyman and Pearson’s decision. A nuanced debate on how to interpret the evidence, with the discussion of statistical reasoning and the design of experiments, became a set of fixed rules for students to follow them.

Conventional scientific research would be based on simplistic thresholds of the true or false decisions about hypotheses. In this world learned from roles, the experimental effects were present or were not. The medications worked or did not. It would not be until the 1980s that the main medical magazines finally began to free themselves from these habits.

Ironically, much of the change goes back to an idea that Neyman coined in the early 1930s. With the economies fighting in the great depression, he noticed that there was a growing demand for statistical ideas about the life of populations. Unfortunately, there were limited resources available for governments to study these problems. Politicians wanted results in months, or even weeks, and there was not enough time or money for a comprehensive study. As a result, statistics had to trust a small subset of the population. This was an opportunity to develop some new statistical ideas. Suppose we want to estimate a particular value, such as the proportion of the population that has children. If we try 100 random adults and none of them are parents, what does this suggest about the country as a whole? We cannot definitely say that no one has a child, because if we try a different group of 100 adults, we could find some parents. Therefore, we need a way of measuring how insurance we must be about our estimate. This is where Neyman’s innovation entered. He showed that we can calculate a “confidence interval” for a sample that tells us how often we must expect the true value of the population to be in a certain range.

Trust intervals can be a slippery concept, since they require that we interpret tangible real life data imagining many other hypothetical samples that are collected. Like those type II errors, Neyman’s confidence intervals address an important question, only in a way that often perplexes students and researchers. Despite these conceptual obstacles, there is value in having a measurement that can capture uncertainty in a study. He is often tempting, particularly in the media and politics, focusing on a unique average value. A unique value may seem safer and more precise, but ultimately it is an illusory conclusion. In some of our epidemiological analysis oriented to the public, my colleagues and I have chosen to inform only the confidence intervals, to prevent attention from being out of specific values.

Since the 1980s, medical magazines have focused more on confidence intervals instead of true or false independent statements. However, habits can be difficult to break. The relationship between trust intervals and P values ​​has not helped. Suppose our null hypothesis is that a treatment has a zero effect. If our estimated 95 percent estimated trust interval for the effect does not contain zero, then the P value will be less than 5 percent, and according to the Fisher approach, we will reject the null hypothesis. As a result, medical documents are often less interested in the interval of uncertainty in themselves, and instead they are more interested in the values ​​it has, or does not have, they contain. Medicine could be trying to go beyond Fisher, but the influence of its arbitrary limit of 5 percent remains.

Adapted extract of Test: uncertain science of certainty, By Adam Kuchorski. Published by Books profile on March 20, 2025, in the United Kingdom.

Leave a Reply

Your email address will not be published. Required fields are marked *