Reproducible Data is Hard. Use Robots

Perhaps the National Institutes of Health could build and fund automated, experimental pipelines to check pre-clinical results or other widely read studies. That seems a worthy investment to me.

This is from Codon, my weekly newsletter. Subscribe for free.

Last week was my last as a Data Journalist at Spectrum, an editorially-independent news website, funded by the Simons Foundation, that covers autism research.

On Monday, I started as Head of Media for New Science, a 501c3 nonprofit that is building 21st-century institutions of science. You can learn more about what we’re working on at

In this new role, I’m looking for essays on science; how it could be better, how it fails, and why some people and places seem to be exceptionally good at it. If you have an idea but lack the time to turn thoughts into words, please send me an email: Let’s work together to bring your ideas to life.

And thanks for reading.

— Niko

A Job for Robots

Cell papers have always intimidated me. They feel serious, glossy, unquestionable. Perhaps that’s, in part, because an article in Cell routinely runs to over 40,000 words in length and contains up to seven figures. Supplemental materials can span 80 pages or more.

If scientific papers were written in a linear, logical fashion (like the Delbrück, Avery and McCarty papers of the 1950s, at the dawn of molecular biology), perhaps Cell papers would be decipherable. I could sit down, head in my hands, and spend time reading the words.

At some point, though, my attention shifts; dirty laundry sits in the corner of my apartment, the cats need food, and warm weather beckons.

When faced with such a data deluge, who has the time and attention to read — and truly understand — such lengthy papers? More importantly, who has the time and patience to reproduce the results printed in Cell or Nature or Science? There are no incentives for replicating experiments (confirmatory results are rarely printed), and most PhD students would surely prefer to spend their time on original experiments.

“The Reproducibility Project: Cancer Biology — a crowdsourced project — aimed to replicate cancer biology studies published between 2010 and 2012,” according to a report from New Science that I helped edit. Out of 23 ‘high-impact studies,’ less than half could be reproduced. “The project originally flagged 53 papers for replication experiments, but ‘vague protocols and uncooperative authors’ meant that just 23 could even be tested.”

Reproducibility is especially important for synthetic biology — a field riddled with noisy experiments at the stochastic boundaries of the living and non-living — but remains a profoundly difficult problem to fund, reward, and incentivize. Rather than trying to force a cultural shift, whereby researchers are asked to spend years replicating experiments, such experiments could be carried out by robots.

A recent bioRxiv preprint proves this is possible.

For a DARPA-funded program, called Synergistic Discovery and Design (SD2, more details on the project), automated protocols were used to recreate genetic circuits first reported in a 2017 study in Nature Communications by Gander et al.

For this preprint, each genetic circuit was re-designed using software developed by Smart Information Flow Technologies, or SIFT. Those designs were sent to “a robotic laboratory operated by the Strateos company” in Menlo Park, California. The generated data were then “automatically uploaded to a repository hosted at the Texas Academic Computing Center (TACC), for analysis by scientists at locations throughout the US,” according to the preprint.

In the original study, there were 24 strains. Each strain was studied across three replicates. And for each replicate, there were between 1,000 and 3,300 flow cytometry measurements.

On the robots, the datasets grew by “more than 2 orders of magnitude” compared with the original experiments. Each replicate, before gating, had about 30,000 flow cytometry measurements.

The robotic results largely agreed with Gardner’s. Logic gates that did not produce reproducible results were redesigned and improved so that, in a single go, results were “fact-checked” and refined.

Perhaps the National Institutes of Health could build and fund automated, experimental pipelines to check pre-clinical results or other widely read studies. That seems a worthy investment to me.

Read more at bioRxiv.

Other Papers

(↑ = recommended article, * = open access, † = review article )

Basic Research

*A scaling law in CRISPR repertoire sizes arises from the avoidance of autoimmunity. Chen H, Mayer A & Balasubramanian V. Current Biology. Link

*Genome-wide protein–DNA interaction site mapping in bacteria using a double-stranded DNA-specific cytosine deaminase. Gallagher LA…Mougous JD. Nature Microbiology. Link

*CRISPRi chemical genetics and comparative genomics identify genes mediating drug potency in Mycobacterium tuberculosis. Li S…Rock JM. Nature Microbiology. Link


Engineering nonphotosynthetic carbon fixation for production of bioplastics by methanogenic archaea. Thevasundaram K…Chang MCY. PNAS. Link

↑*De novo biosynthesis of rubusoside and rebaudiosides in engineered yeasts. Xu Y…Liu L. Nature Communications. Link

Photocontrol of Itaconic Acid Synthesis in Escherichia coli. Li Y…Ren Y. ACS Synthetic Biology. Link

*Bioconversion of CO to formate by artificially designed carbon monoxide:formate oxidoreductase in hyperthermophilic archaea. Lim JK…Kim YH. Communications Biology. Link

*Cell-free prototyping enables implementation of optimized reverse β-oxidation pathways in heterotrophic and autotrophic bacteria. Vögeli B…Jewett MC. Nature Communications. Link

Computational Tools & Models

*Identification of genome edited cells using CRISPRnano. Nguyen T…Rossi A. Nucleic Acids Research. Link

↑Anti-CRISPR prediction using deep learning reveals an inhibitor of Cas13b nucleases. Wandera KG…Beisel CL. Molecular Cell. Link

↑*Computationally designed hyperactive Cas9 enzymes. Vos PD…Rackham O. Nature Communications. Link

CRISPR & Genetic Control

*Precise CRISPR-Cas–mediated gene repair with minimal off-target and unintended on-target mutations in human hematopoietic stem cells. Tran NT…Chu VT. Science Advances. Link

*Utilizing RNA origami scaffolds in Saccharomyces cerevisiae for dCas9-mediated transcriptional control. Pothoulakis G, Nguyen MTA & Andersen ES. Nucleic Acids Research. Link

*CRISPR-mediated protein-tagging signal amplification systems for efficient transcriptional activation and repression in Saccharomyces cerevisiae. Zhai H…Hou J. Nucleic Acids Research. Link

DNA Synthesis

*Evaluation of 3′-phosphate as a transient protecting group for controlled enzymatic synthesis of DNA and XNA oligonucleotides. Flamme M…Hollenstein M. Communications Chemistry. Link

*Enzymatic Synthesis of Chemical Nuclease Triplex-Forming Oligonucleotides with Gene-Silencing Applications. McGorman B…Kellett A. Nucleic Acids Research. Link

Genome Editing

↑*A universal system for streamlined genome integrations with CRISPR-associated transposases. Wang M…Wang K. bioRxiv (preprint). Link

*Modular (de)construction of complex bacterial phenotypes by CRISPR/nCas9-assisted, multiplex cytidine base-editing. Volke DC…Nikel PI. Nature Communications. Link

*Noncanonical amino acid mutagenesis in response to recoding signal-enhanced quadruplet codons. Chen Y…Guo J. Nucleic Acids Research. Link

*Genome editing using preassembled CRISPR-Cas9 ribonucleoprotein complexes in Fusarium graminearum. Lee N…Son H. PLOS One. Link

Medicine & Diagnostics

Transplantation of a human liver following 3 days of ex situ normothermic preservation. Clavien P…Tibbitt MW. Nature Biotechnology. Link

*Simplified Cas13-based assays for the fast identification of SARS-CoV-2 and its variants. Arizti-Sanz J…Myhrvold C. Nature Biomedical Engineering. Link

Assessment of AAV9 distribution and transduction in rats after administration through intrastriatal, intracisterna magna and lumbar intrathecal routes. Chandran J…Meno-Tetang GML. Gene Therapy. Link

Forced activation of dystrophin transcription by CRISPR/dCas9 reduced arrhythmia susceptibility via restoring membrane Nav1.5 distribution. Zhang R…Pan Z. Gene Therapy. Link

*Efficacy and Safety of a Recombinant Plant-Based Adjuvanted Covid-19 Vaccine. Hager KJ…Ward BJ. The New England Journal of Medicine. Link


*A single promoter-TALE system for tissue-specific and tuneable expression of multiple genes in rice. Danila F…Langdale JA. Plant Biotechnology Journal. Link

*Redox-engineering enhances maize thermotolerance and grain yield in the field. Sprague SA…Park S. Plant Biotechnology Journal. Link

Protein & Molecular Engineering

Leveraging intrinsic flexibility to engineer enhanced enzyme catalytic activity. Karamitros CS…Georgiou G. PNAS. Link

Tools & Technology

↑Scalable biological signal recording in mammalian cells using Cas12a base editors. Kempton HR…Qi LS. Nature Chemical Biology. Link

Detection of cell–cell interactions via photocatalytic cell tagging. Oslund RC…Fadeyi OO. Nature Chemical Biology. Link

*BacPROTACs mediate targeted protein degradation in bacteria. Morreale FE…Clausen T. Cell. Link

*Open-source personal pipetting robots with live-cell incubation and microscopy compatibility. Dettinger P…Schroeder T. Nature Communications. Link


↑*ColabFold: making protein folding accessible to all. Mirdita M…Steinegger M. Nature Methods. Link

*Simple and effective serum-free medium for sustained expansion of bovine satellite cells for cell cultured meat. Stout AJ…Kaplan DL. Communications Biology. Link

†*Biodegradation of polyethylene and polystyrene: From microbial deterioration to enzyme discovery. Zhang Y…Guo Z. Biotechnology Advances. Link

More soon,

— Niko // @NikoMcCarty //