Sunday, 8 February 2015

Science and Statistics - An Unholy Alliance?

I came across this very interesting article the other day, written in 2010. It basically reinforces my own long held sense of unease with regard to statistical analysis. My reaction against statistics started early, in school, when first I was presented with its somewhat bizarre pseudo-mathematical methodology and nomenclature. My early rejection of the subject was more visceral and emotive rather than common sense factual and logical. It just hit a raw nerve with me somehow and all these years later, reading this article by Tom Siegfried, I begin to see perhaps why. So let me begin by quoting a few passages from Siegfried's text:

 "During the past century, though, a mutant form of math has deflected science’s heart from the modes of calculation that had long served so faithfully. Science was seduced by statistics, the math rooted in the same principles that guarantee profits for Las Vegas casinos. Supposedly, the proper use of statistics makes relying on scientific results a safe bet. But in practice, widespread misuse of statistical methods makes science more like a crapshoot."

So it's not statistics itself, but the misuse of this analytical toolbox which is the problem. To this I would add over-reliance, especially evident in the field of climate science. Too often in the peer reviewed climate science literature we find papers which base their conclusions almost totally on the results of some new statistical analysis/re-analysis of existing data. In order to fully appreciate what they are saying and, more importantly, in order to question what they are saying, one needs to be an expert not primarily in climate science, but in statistical analysis.

"Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions. Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing."

This does not inspire confidence.

"Experts in the math of probability and statistics are well aware of these problems and have for decades expressed concern about them in major journals. Over the years, hundreds of published papers have warned that science’s love affair with statistics has spawned countless illegitimate findings. In fact, if you believe what you read in the scientific literature, you shouldn’t believe what you read in the scientific literature."

With the increasingly pervasive use of statistical analysis in climate science, backed up by increasingly complex computer models, the above statement is magnified 10-fold in consideration of the results of the latest peer-reviewed scientific research. Much of this said research is aimed at pointing the finger at man as being responsible for the majority of post 1950 global warming, claiming also that we will continue to drive climate significantly into the future. Yet much of it is based upon statistical reanalysis of existing data.

"Nobody contends that all of science is wrong, or that it hasn’t compiled an impressive array of truths about the natural world. Still, any single scientific study alone is quite likely to be incorrect, thanks largely to the fact that the standard statistical system for drawing conclusions is, in essence, illogical. “A lot of scientists don’t understand statistics,” says Goodman. “And they don’t understand statistics because the statistics don’t make sense.”"

A perfect illustration: the recently released paper by Marotzke and Forster. The main impetus for the paper was to address the apparent mismatch between climate models and real world observations (in particular the 'pause') which sceptics use to question the validity of the AGW theory. The paper concludes:

"The differences between simulated and observed trends are dominated by random internal variability over the shorter timescale and by variations in the radiative forcings used to drive models over the longer timescale. For either trend length, spread in simulated climate feedback leaves no traceable imprint on GMST trends or, consequently, on the difference between simulations and observations. The claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations therefore seems to be unfounded."

So, climate models do not overestimate the response to GHG forcing, even though the CMIP5 model mean is increasingly diverging from actual recorded global mean surface temperatures (GMST) and even though almost all models clearly run 'too hot' when compared with actual GMSTs. Apparently, this impression is not borne out by statistically analysing the past temperature record and comparing that with the models [?] It's opaque to me and probably a lot of other people besides. Nic Lewis thinks it is plain wrong, and says so at Climate Audit, laying out his reasons. He gave Marotzke and Forster the opportunity to reply to his concerns about their paper but they failed to respond before Nic Lewis published at Climate Audit. Instead, they have chosen to issue a rebuttal of Lewis' rebuttal at Climate Lab Book here. I've no idea who will eventually be proved to be right or wrong in this kerfuffle, but I quote from statistical expert Gordon Hughes (Edinburgh University), being one of two people whom Nic Lewis asked to review his conclusions about M & F, 2015:

"The statistical methods used in the paper are so bad as to merit use in a class on how not to do applied statistics.
All this paper demonstrates is that climate scientists should take some basic courses in statistics and Nature should get some competent referees."  

The wider point here is that we have yet another paper which relies almost exclusively upon statistical methodology to draw conclusions about the real world - another paper which may have to be withdrawn. Science - and climate science in particular - is suffering from the all too pervasive influence of staistics. There is a place for statistics in the analysis of real world data and even I must (reluctantly) acknowledge this. However, science has, as Tom Siegfried points out, become "seduced" by the false promise of this "mutant" form of mathematics and is suffering from its misuse and its overuse.