Reproducibility

I don’t know why but whenever something from Center for Open Science pops up in the news feeds, my fingers get twitchy and I start to imagine writing some response (see also). It may be because @BrianNosek and I used to play volleyball together and he was a terrific athlete whose awesome v-ball spikes I am still imagining trying to block (true story).

COS recently posted an article with the title “One Preregistration to Rule Them All?” discussing various mechanics of pre-registering of one’s study when there are multiple sub-studies and inter-linked collaborative studies. This led my colleague @jplotkin to comment tongue-in-cheek that it would make @BrianNosek Sauron and @OSFramework Mount Doom. While I am sure Josh meant it as a joke, I thought the “Rule Them All” and “Sauron” comments were an apt metaphor for the good intentioned efforts of COS—efforts that I have a hard time buying into.

It is common to think that the main failure of Communism, with its ideal of “From each according to his ability, to each according to his need”, was mostly a failure of human nature; our greed and our laziness unable to live up to those lofty ideals. But, communism has a much more critical obstacle than mere human foibles. Communism, capitalism, socialism, etc. are economic systems, each sharing the main goal of matching supply (from one’s ability) to demand (one’s need). So, any system has to solve the problem of how to allocate appropriate labors and resources to create supplies that meet the demands of every member of the society–a problem involving literally billions of variables. Free markets are basically a heuristic solution to the problem of optimal allocation. If we want to employ a different approach, say rational planning (e.g. communism), then we need to solve Leonid Kantorovich’s (the only Soviet Nobel Laureate in Economics) computational problem of optimal resource allocation. Of course, there are no solutions for optimization over billions of variables.

I previously suggested that the main activity of science is to exchange ideas. Each of us supply ideas and data and we consume the ideas and data created by others. We might think of each journal as kind of a market where we each display our wares in the hope that some might buy them. Like real markets, there is some regulation to prevent fraud, but for the most part, supply and demand are dynamically regulated by the actors themselves. The current journal system has a lot of problems, but as a heuristic solution to the optimal allocation problem of getting the “true” knowledge to those that need it, the system works pretty well—albeit with caveat emptor.

COS and its supporters are focused on the errors and negatives of such a heuristic solution and want to create systems to rationally distribute knowledge. Like early communists, I am sure their goals are motivated by honest wishes for improving society. Yet, in the end, whatever system they construct must not only eliminate the problems of the free market but also provide solutions to the optimal allocation problem. And, solving this problem involves not only the intractable computational problem of optimization but also the more subtle problem of taking into account “unknown but possible worlds” variables. For example, before deciding on what to cook for dinner, I often browse the aisles at my grocery store to see if something is inspiring. The result is quite different from what would have happened if somebody delivered bunch of groceries to my door. Similarly, flawed studies like the famous ‘arsenic-life’ or say the human cloning study actually triggered all kinds of interesting science.

As ominously presaged by the title of their own blog article “Preregistration: A plan, not a prison“, creating a global plan without truly solving the allocation problem is highly likely to fall short of the heuristic free market solution and feel very much the prison to the players put in sub-optimal positions. Of course, when we add to this all the failures of human foibles, it might be good to remember how the well-intended revolutionaries soon moved to the absolutism of the Bolsheviks and the subsequent even more unfortunate autocratic events.

So, do we really want sanitized science delivered in a Harvest Box every month? For me, I really enjoy my occasional Big Mac and I’d rather take my chances with some fake p-values and irreproducible results.

[[Addendum]]

After I posted this, @BrianNosek kindly replied to my tweet and we had the following exchange:

BrianTweet

So, I think it comes down to where each of us thinks the “right light-weight” line sits. If some one commits a bad act, clearly the society incurs a cost. We can take various actions to suppress the probability of the bad act, lowering the expected cost. But, every suppressive action has its own cost and we have to consider the sum cost. In general, simple punitive actions (e.g., censuring a researcher when they commit fraud) is less costly to execute compared to controlling everybody’s actions for prevention (think taking off your shoes at the airport). This is Foucault’s “punitive city” versus “coercive institution” dichotomy.

What action to take to minimize total cost depends on the cost of the bad act and the incremental suppression costs vis-a-vis the incremental reduction of the bad act. We could have situations like the left figure below or the right figure. In fact, sometimes the lowest cost might be to do nothing (e.g., children trespassing on grass). And, I admit some actions should be absolutely prevented despite enormous costs of suppression. The difficulty is we don’t know what the cost functions looks like for the issue of reproducibility in science. (Or, for that matter, we don’t seem to investigate cost functions for most rules we adopt or don’t adopt.) However, once put in place, coercive institutions tend to have an insidious tendency for viral spread and economic entrenchment of interests for maintaining the “political technology” of coercion. I think we are all pretty familiar with that.

CostGraph

Recently, two somewhat different topics on the business of science came across my Twitter feed.

The first was yet another push for “reproducibility in science” by 72 authors whose latest prescription was to set p = 0.005 (under Neyman-Pearson hypothesis testing) as the new threshold for “significance.” This paper was picked up by the usual Nature et al. press and, of course, generated lots of thumb-time. Without irony, an accompanying Center for Open Science blog post suggests:

“the fact that this paper is authored by statisticians and scientists from a range of disciplines—[…]—indicates that the proposal now has broad support.” (my italics)

That is, the fact that 72 self-selected individuals out of hundreds of thousands of researchers signed on indicates that the proposal has broad support.

Well, regardless, reproducible science, like apple pie, has to be a good thing (maybe). The paper is careful to point out that the proposal is not about publication standards or policy but standards of evidence in science. That is, it is about science.

So then, who in science needs this proscription on the use of the word “significance?” Suppose there is a risky and expensive experiment or maybe even a new dissertation project. I am trying to imagine a student arguing (against advice of caution)

“But, but, the article said it was significant!”

In fact, even for very well executed ground-breaking studies, typical journal clubs are exercises in skepticisms, take-downs, and “what-about-isms” (probably to the detriment of the discussants). Do grant review committees take at face value an applicant’s claims of “significant” preliminary results? Do hiring committees?

Q1: To whom is regulating the use of the term “significant” valuable?

I want to bring up the second topic before trying to answer the above question. This involved another (often repeated) discussion on whether one should cite papers in preprint servers like the new bioRxiv. Without regurgitating the various of pros and cons, much of the argument against citing a paper in a preprint server came down to “giving validation” to something that might not deserve such validation. That is, it was again supposed to be about standards of evidence in science. Science will benefit when we only acknowledge that which has been okayed by three people—including the notorious Reviewer 2.

This was interesting as I always thought the role of citation was to establish the “source of information” from which I was deriving my own thesis. I thought the worry on citations was about missing possible relevant sources (and pissing off somebody) or citing sources that might be too ephemeral. We used to cite “pers. comm” which I think is okay as long as the “citee” doesn’t die. Preprints do have some possibility of the second problem, but then it is probably as durable as any online journal. So, why worry about citing non-peer reviewed papers?

Q2: To whom is regulating what is cited valuable?

Any human activity, regardless of whether it is art or science, acquires an economic structure. Efficient operation of the economy requires exchange tokens; we would rather not cart around bushels corn to exchange for milk so we make up credit papers (i.e., money). In science, the economy is supposed to be organized to enable exchange of ideas. But, we also need efficiency, so rather than read the candidates’ papers, we look up their h-index. Time saved. But, use of money or tokens like citations requires establishment of valuations (how much is that puppy in the window). So, one might think scientists who only want to cite “peer-reviewed” work are attempting to create accurate valuations—despite the Dutch Tulip prices created with drive-by-citations (see Lior Pachter’s discussion of this wonderful term from Andrew Perrin). But, what value is being assessed by citations as the valuation mechanism?

This brings me back to the above “significance” issue. Who cares about this language use? As I mentioned, I have never experienced anybody changing their negative opinion because some authors stated that their results were “significant”, statistically or otherwise, that is, if they actually read the paper. If indeed well-regulated use of the term significance leads to reproducible results in print, it should save people’s time. Well, but the original 72 authors of the p = 0.005 paper state that they are not talking about publication standards but language descriptors (valuations) and suggest adding other descriptors like “suggestive.”

But, hey, why even try to convert a metric scale (real-valued probabilities) into some vague ordinal scale? Because, the journals—more specifically the non-expert editors proliferating in current commercial journals, care. Significance is the bouncer behind the velvet rope they use to enshrine the (high impact factor) journal corpus. In fact, many of the journals explicitly ask the reviewers about “significance”. And, those polite rejection letters mentioning “more specialized journals” always mention “significance”. Yes, I know statistical significance is not the same thing as these uses of the word “significance”. Or, is it?

No one can argue against the idea that science demands everybody to do their due diligence. But, the specific concern has been focused on journal publications. Who made the printed words in peer-reviewed journals the matter of record? That is, when did putting scientific work to print become canonization instead a form of communication? And, who decided citations of such printed works should be a valuation token? I don’t know who decided all this but I do know who benefits (the answer to above Q1 and Q2): The Journals.

For the journals, there is a clear self-interest in establishing themselves as “all that is fit to print” and the “matter of record”. That is, nothing would make the journals happier than being the gate keepers of Truth. But, should this be also true for science and scientists? If we look at papers from 10, 20, 50 years ago, what percent of them hold up? Should Newton have been prevented from publishing Principia since he didn’t take gravitational curvature into account? Science is the never-ending search for, and refinement of, understandings of nature–which we pursue through exchange of ideas. We communicate these ideas to each other through the printed medium because it adds precision and distributional efficiency. We established the tradition of peer review because it helps increase (in some undefinable manner) the quality of the communications. Into this economy, commercial journals and journal empires came and, mirroring the rise of the financial sector in the real economy, established a derivative market where the number of publications, citations of publications, impact of publications, are made to replace the actual value of science itself. More they can convince us that “validated citations” are important and that only “significant” results get published in significant journals, more they solidify their position and substitute publication for science. They would like to see you bite down into Credit Default Swap papers of “Altimetrics” than an actual apple.

We are all human and to some extent efficiency and expediency makes us all admire a CV with a large list of Nature, Science, and their baby critter journals. But, we need to remember that the interests of these journals are not the same as the interests of science. Journals literally bank on our asking them to validate us.

Don’t let the journals dictate our values. Don’t let them win. Resist.

Addendum: A more serious question is what percent of published papers should be reproducible? If you think 100%, you have never thought about the problem of optimization over a rugged landscape. And, Nature is rugged indeed.

TheoryB

Theoretical Matters in Biology, Social Matters in Research

Tag Archives: Reproducibility

Sauron and Center for Open Science

Don’t let the journals win

Share this:

Share this: