Hacker News new | past | comments | ask | show | jobs | submit login
Scientists lift the lid on reproducibility (2016) (nature.com)
60 points by hubraumhugo on Oct 3, 2022 | hide | past | favorite | 29 comments



The elephant in the room is that if you add up all the reasons why a scientific paper could be "wrong", the median paper is wrong. This sounds crazy, cynical or both, but is a natural conclusion of the many ways to be wrong:

* a clinical trial had (insufficient sample size|no controls|other methodological error) and is excluded from meta-analysis for this reason

* engineering analyses that turn out wrong (an early paper that postulated you could use a lightly modified jumbo jet to seed the stratosphere with sulfur or early analysis of LWR accidents)

* numerous theoretical physics papers that are a shot in the dark (e.g. papers about superstring theory if the Universe is not supersymmetric, all the competing models if it is)

* math-heavy papers with 50 pages of calculations and a sign reversed on page 23

* CS papers based on software that was wrong (I looked at the source code and fixed the bug)

* one I wrote (nobody ever told a group of physicists how to tell if a power law is a good fit to a probability distribution or how to estimate the slope)

When I have been out of the academic loop and had to spend $30 to buy a copy of an academic paper that we thought would help in a commercial project it has often been a poor investment because the content wasn't useful.

Of course some scientific papers are highly valuable but I'd say one reason I never found my voice writing scientific papers was that I was aiming for a level of quality that was unrealistic and not representative of what I was surrounded with.


More evidence:

My brother has a PhD in biochemistry and quit one job at a lab when he pointed out to the head scientist, a woman who had made her reputation out of a series of related papers and work, of a serious flaw in the experiments that could invalidate the result. Not only did she ignore him repeatedly without acknowledging the flaws as important (note: if my brother was right, the core result was invalidated!) but continued to churn out papers on the same topic.

This comment is not anti-science, mind you. It's just to show that when egos and reputations are involved, all bets are off. Nobody wants to be told they built a house of cards. I'm confident the conclusions from these papers are either irrelevant in the grand scheme of things (so it doesn't matter if they are wrong), or would get eventually rejected when other scientists take a look at them.


> This comment is not anti-science, mind you.

I find it funny that you feel the need to say this, but I also 100% get it. In my experience, there's no one more skeptical of scientific publications than people actively doing research, but it's such a delicate balance to do so without upsetting the generically pro-science laymen. It's like a modern day shibboleth, come to think of it.


This behavior makes me extremely suspicious of anything produced by someone who displays it.

Back when I was in academia, I had a similar experience. Someone gave a talk and I immediately had doubts about the method used. It was known in a related area that methods like the one presented tended to appear more accurate than they actually are. So I spoke with the researcher privately after the talk. I didn't expect this researcher to know about the problem, and I think I said that it's possible everything is okay but some extra checks need to be done. They seemed interested then, so I sent them an email with some papers on the problem. I never got a response, and their paper was published without even mentioning what I said. The paper now has over 100 citations and that research group has built on the method I had a problem with.


> I was aiming for a level of quality that was unrealistic and not representative of what I was surrounded with.

In my experience the people who focus on high quality scientific output are some of the smartest and most successful people in the field. The challenge is that you have to be productive as well. This is why we don't have error-free science, just as we don't have bug-free software. (And it should be said: unless you produce bug-free software [or whatever it is you do professionally] there is probably someone who could cherry-pick your errors and present the same case about your work. It's important to approach life with humility.)


The uglier truth is that the median paper doesn't matter and probably won't be cited or read except for other papers that don't matter either.


There's a big qualitative difference between inaccurate calculations from flipped signs and software bugs to fundamentally wrong analysis.

Also why aren't the former type of issues caught before being published in the paper?


Peer review is not effective.

A peer reviewer is not going to go through complex calculations line-by-line or find bugs in the software. The person who does find that is a person following in the footsteps of the researcher who wants to build on that result.

Software bugs are common in commercial software too but if they are put in front of customers most get fixed.

What boggles my mind are the poor standards in biomedical work. Math-heavy papers are often written by one or two people and a computer but clinical studies involve a lot of people, paper and you'd think formalized process.

I remember one paper where they tested COVID-19 patients and could not detect Vitamin C in their blood but didn't test anyone who didn't have COVID-19. Vitamin C levels vary greatly in different populations and the test is tricky and requires careful sample handling... It could be they were unable to detect Vitamin C at all, but they only way they could disprove that would be to use proper controls.

Most clinical trials are rejected when it comes time for a meta-analysis, I think that journals shouldn't be accepting studies that wouldn't be accepted for meta-analysis and work that isn't to that standard shouldn't be funded.


Why not have every (bio/medical) paper then include the cost of a separate red team to recreate the results? Is there a fundamental issue why this wouldn't solve the reproducibility by handling it internally as part of the process itself?


I am unable to reproduce work in CS most of the time. In part, because technology moves so fast. You can’t earn a PhD or tenure by maintaining open source software, so after a few years most implementations are rotten. Research software rots faster than other software because it’s built by people who aren’t professional engineers. Code quality doesn’t earn PhDs or tenure, either. We also use expensive new technologies with changing interfaces. So research software almost always rots.

Also in part because papers are “dressed up.” That’s a polite way of saying that the description of what they’ve built has been stretched. Authors need to make reviewers excited. They do this with ideas, not code. The purpose of their code is to show that the ideas could be practical, not that they are. So research software is almost always incomplete.


What do you suggest? PhD-students answering support calls?


All PhD's should be required to spend one year building production software at a company, and should be subjected to weekly code reviews.

Mostly kidding.


More shocking to me is that more than half have failed to reproduce their own experiments. Something is seriously rotten there. This shows the researchers note taking and methodology recording is seriously flawed.

Science and published knowledge as of today looks mighty fragile, given it's built on empiric data from these experiments.


It's utterly insane that we have scientific fields where even its practitioners don't believe the papers published in them. Not sustainable long term and science will only lose credibility as more and more of it eventually gets exposed as nonsense.


The incentives are all out of whack and there are too many people going into the academic world for the wrong reasons. Competition for funding and resources kills good science.


>“At the current time there is no consensus on what reproducibility is or should be.”

This seems like an underrated part of it. The meaning of 70% is a bit unclear to me since the article didn't cite how many replication studies most have done. If most have done 10 replication studies and weren't able to replicate a component of it they would count towards the 70%.

Given the quality of research in the social sciences I would suspect the issue is much more significant than that - but nonetheless a distorting way to report the stats IMO.


There is a big piece of missing context in these studies -- the significance of the results that could not be reproduced. The fact that mammalian genes are longer than the mRNA they encode (on average by a factor of 5 or so, and sometimes by a factor of 100 or more) was a paradigm shift, and we know that it is true, not because the original experiments have been reproduced, but because in the past 40 years, the observation has been confirmed millions of times (20,000 times per mammalian genome). Likewise, we just keep getting more and more data supporting common ancestry of genes (and organisms).

It is important to remember that the "median" paper is cited less than 5 times, which means that there may be a large fraction of results that cannot be reproduced, but also have had little effect on science. There have also been highly cited papers that could not be reproduced but whose results were correct (break through experiments can be hard to reproduce), and there are highly cited papers that were simply wrong. But it is hard to measure these phenomena in an un-biased way.

The exact reproducibility of a paper is certainly important, but science is much more about building on results than reproducing experiments.


An even more interesting question is:

Has the rate of reproducibility change at all over the decades?

Are we just simply more aware of the problems today than in the past?

When you had 10 papers published out of which 7 were BS, you could weed those out by reading each.

When you have 1K papers and 700 are BS ... you are not going to be able to do so.


It's ridiculous because papers formatting restricts scientists from being able to share the content that makes the work reproducible. "Publish or perish," the model that causes scientists to push unrealistically high publishing output in order to gain funding, forces scientists to push theoretically intriguing work that is not reproducible... And ultimately the public suffers. (Most foundational R&D is publicly funded.) Fun fact, my friend is a PhD chemist - says if you try to reproduce any reaction from a bunch of Chinese authors, almost 100% guarantee that it will fail. China's publish or perish model is far worse than the US or EU. It's just sad.


super huge problem, especially when you think about the larger scale implications of that inefficiency (having to re-invent the wheel experimentally over and over again when science is SUPPOSED to be architectural or reproducible). This is one of the biggest reasons why we started working on https://www.scifind.net/ to build an open-access database of this kind of empirical tacit knowledge. Like how StackOverflow tackled the granularities of tech-based problem solving. An interesting paper I like to look at is this one (https://www.facetsjournal.com/doi/10.1139/facets-2019-0007) that talks about all the "dark knowledge" that exists in the space...in this scenario, knowledge whether institutional or word-of-mouth that constantly gets lost in the ether.

Instead of constantly prioritizing the erudite publishing system (whose goal is conceptual in nature), I think a bit of openness from a utilitarian perspective can seriously help scientists with troubleshooting and reproducibility. The proof of the pudding is in the eating.


I wonder what the differences are across the various subdomains. I'm sure it would be wildly different.


Unable to quickly locate it, but my understanding is that likelihood of reproducibility grow as the field of science becomes more scientific over the spectrum of soft to hard sciences.


I don't like the word "crisis". Can we work on improving reproducibility? Absolutely. But approaching it from the lens of "crisis" leads to completely different paths.


Something new here?

Some previous discussion from 3 years ago:

https://news.ycombinator.com/item?id=21963509


Note: The publication is from 2016 (I didn't have enough space for this in the title)


"Please use the original title, unless it is misleading or linkbait; don't editorialize."

https://news.ycombinator.com/newsguidelines.html

(Submitted title was "70% of researchers have failed to reproduce another scientist's experiments". We've changed it now.)


As is, title might also be read as 70% of researchers make no attempt to reproduce others work. Likely be clearer is it said, “Survey: 70% of Research Results Not Independently Reproducible (2016)”


Good input, any thought on how to rephrase it more objectively? Edit: Mod did it for me, thanks


Edit works, thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: