Nutil

Statistics Literacy

How to understand and not be fooled by numbers

Statistics Literacy

Numbers seem objective, but they can mislead. Here's how to think critically about statistics.

Correlation vs Causation

The most important concept: correlation does not prove causation.

Correlationβ“˜

Causationβ“˜

Mendelian randomization helps establish causation in observational studies[3].

### Confounding Variables

A confounderβ“˜ creates the appearance of a relationship that doesn't exist.

Common confounders:

Understanding P-Values

Myth: A P-value of 0.05 means there's a 95% chance the finding is true
Reality: A P-value is the probability of getting results at least as extreme IF the null hypothesis is true. A P-value of 0.05 means 5% chance of seeing this result by random chance. It does NOT mean there's a 95% chance the finding is true.

P-values are ubiquitous in published researchβ€”and almost always significant[5].

### Statistical vs Practical Significance

A result can be statistically significant but practically meaningless:

Always ask: How big is the effect?

Effect size predicts replication better than sample size[6].

The Replication Crisis

Many published findings may be false positives[7].

Why findings don't replicate:

P-hacking inflates false positive rates[8].

How Language Obscures Results

Researchers use creative language to spin non-significant results[9].

Watch for spin language:

If it's not significant, it's not significant. There's no "almost."

Understanding Risk

### Relative vs Absolute Risk

Headlines love relative risk because it sounds dramatic:

But absolute risk tells the real story:

How risk is presented affects understanding[10].

Always ask for absolute numbers.

### Base Rate Neglect

People often ignore how common something is[11].

Example: A disease test is 99% accurate.

Answer: Only about 9%! Most positive tests are false positives because the disease is rare.

Base rateβ“˜

Sample Size and Selection

### Small Samples Are Noisy

Small studies produce extreme results in both directions. The "best" and "worst" schools/hospitals/products are often just small ones with random variation.

When sample size is small, treat all findings skeptically.

### Selection Bias

How samples are chosen matters:

Critical Thinking About Statistics

Training in logical fallacies improves ability to detect misinformation[13].

### Questions to Ask

1. What's the sample? Who was included? Who was excluded?

2. What's the comparison? Compared to what? No comparison = no conclusion

3. Could there be confounders? What else might explain this?

4. How big is the effect? Statistical significance β‰  practical importance

5. Has it replicated? One study proves nothing

6. Who benefits? Industry-funded studies often favor funders

### Red Flags

Interaction Effects

Interactions between factors are often misunderstood[14].

Interactionβ“˜

When two factors interact, you can't simply add their individual effects.

Visualizing Data

Good visualizations help understanding. Bad ones mislead.

Common tricks:

Always check the axes and scale of any graph before drawing conclusions.

Summary: Statistical Self-Defense

1. Correlation β‰  causation β€” Look for confounders

2. P-values have limits β€” Significant doesn't mean important

3. Effect size matters β€” How big is the effect?

4. Replication required β€” One study proves nothing

5. Watch the spin β€” "Trending" means "failed"

6. Absolute > relative risk β€” Demand real numbers

7. Check base rates β€” Rare events have more false positives

8. Sample size matters β€” Small studies are noisy

9. Selection bias everywhere β€” How was the sample chosen?

10. Follow the money β€” Who funded this research?

---

References

  1. ['Davey Smith G', 'Ebrahim S'] (2008). Mendelian Randomisation and Causal Inference in Observational Epidemiology. PLOS Medicine. [DOI]
  2. ['Chavalarias D', 'Wallach JD', 'Li AH', 'Ioannidis JP'] (2018). P values in display items are ubiquitous and almost invariably significant: A survey of top science journals. PLOS ONE. [DOI]
  3. ['Voracek M', 'Tran US', 'Formann AK'] (2024). Challenging the N-Heuristic: Effect size, not sample size, predicts the replicability of psychological science. PLOS ONE. [DOI]
  4. ['van Zwet EW', 'Cator EA'] (2023). Are most published research findings false? Trends in statistical power, publication selection bias, and the false discovery rate in psychology (1975–2017). PLOS ONE. [DOI]
  5. ['Costa-Font J', 'Bover-Bover A'] (2024). Impact of redefining statistical significance on P-hacking and false positive rates: An agent-based model. PLOS ONE. [DOI]
  6. ['Barnett AG', 'Wren JD'] (2022). Analysis of 567,758 randomized controlled trials published over 30 years reveals trends in phrases used to discuss results that do not reach statistical significance. PLOS Biology. [DOI]
  7. ['Okan Y', 'Garcia-Retamero R', 'Cokely ET', 'Maldonado A'] (2021). Comparing the impact of an icon array versus a bar graph on preference and understanding of risk information. PLOS ONE. [DOI]
  8. ['Binder K', 'Krauss S', 'Bruckmaier G'] (2018). Visualizing the Bayesian 2-test case: The effect of tree diagrams on medical decision making. PLOS ONE. [DOI]
  9. ['Halpern DF', 'Butler HA'] (2023). Learning about informal fallacies and the detection of fake news: An experimental intervention. PLOS ONE. [DOI]
  10. ['VanderWeele TJ', 'Knol MJ'] (2021). Understanding interactions between risk factors, and assessing the utility of the additive and multiplicative models through simulations. PLOS ONE. [DOI]