Journalists Miss on AlphaFold News

When DeepMind announced its public release of 200 million predicted protein structures, I decided to study how various news outlets covered the announcement.

This is from Codon, my weekly newsletter. Subscribe for free.

A small cadre of private equity firms (like Alden Global Capital) and media conglomerates (like Hearst Communications) own hundreds of news outlets. Hearst alone owns HGTV Magazine, the San Francisco Chronicle, and dozens of local news channels, newspapers, and radio stations. A venture equity firm, called North Equity, owns Popular Science and Futurism.

As news outlets consolidate, journalists suffer and science sections get axed. Many science journalists write and publish about five news pieces every day, an inhuman feat. Science journalists are overworked and underpaid (salaries start ~$35k). Often, they have no choice but to copy-paste quotes from a press release to meet deadlines.

So what happens? Often, big companies swoop in, gobble up little newspapers and magazines, and change their publishing models to emphasize reads and clicks above everything else. Headlines are king, and damn their accuracy! Journalists, sadly, are stuck in the middle. (For more on this, see Who Owns What from the Columbia Journalism Review).

The goal for big conglomerates, usually, is not to write news that advances the discussion or offers novel insights. It’s to publish articles that get clicked a lot, and to repeat that as often as possible.

So when DeepMind announced its public release of 200 million predicted protein structures, I decided to study how various news outlets covered the announcement.

But first, a timeline. In July of last year, AlphaFold reported, in a Nature paper, that their AI system’s accuracy was “competitive with experimental structures in a majority of cases,” based on results from the 14th annual Critical Assessment of protein Structure Prediction competition.

On July 22 of this year, DeepMind and EMBL jointly announced, in a press release, that they would publicly release “the most complete and accurate database yet of predicted protein structure models for the human proteome.”

On July 28, CEO Demis Hassabis said, at a press briefing, that DeepMind would publicly release 200 million protein structure predictions (database here). These predictions are based on sequences stored in the UniProt database. Most protein pages on UniProt will come pre-loaded with a predicted structure.

AlphaFold’s goal is to accelerate science, and I’m all for it. Their predicted structures are already helping scientists search for new drugs and develop vaccines. AlphaFold is, in my eyes, the most useful demonstration of AI’s capabilities yet — and not only for science.

But there is nuance to DeepMind’s latest release. For one, scientists have only solved about 190,000 protein structures, or less than 0.1% of the new trove. We are still a long way from validating all 200 million predicted protein structures. Any news article that insinuates AlphaFold has solved 200 million structures is wrong.

Secondly, the AlphaFold release is based on UniProt data. That means structures can only be predicted for those organisms that already have protein sequence data — not “nearly every known species,” as an early article from Nature reported.

In this newsletter, I’m ranking how various news outlets covered the AlphaFold announcement. By reading many different perspectives and angles on a single story, I hope to improve my own writing. Here we go.

Nature 3/10 (Link)

One day after posting their story, Nature issued a correction. Their initial article erroneously stated that “AlphaFold had determined protein structures from nearly every known species.”

Even after the correction, the article is misleading. The subhead says that “DeepMind’s AlphaFold tool has determined the structures of around 200 million proteins,” even though an estimated 65 percent of the 214 million predictions are not “highly accurate” and only 45 percent are “considered to be accurate enough for many applications.”

A later tweet that claimed, “AlphaFold has determined the structures of proteins from almost every known organism on Earth,” was swiftly shot down by Reid Olsen. But Olsen’s tweet, too, is misleading: It’s not every known organism; it’s organisms with UniProt data.

New Scientist — 9/10 (Link)

Misleading headline (“DeepMind's protein-folding AI cracks biology's biggest problem”), but the article includes many original insights, including quotes from Keith Willison at Imperial College London and Tomek Wlodarski at University College London that explain deficiencies in the AlphaFold database. Among them: “AlphaFold isn’t able to take any arbitrary string of amino acids and model exactly how they fold. Instead, it is only able to use parts of proteins and their structures that have been experimentally determined to predict how a new protein will fold.”

And “developing a model of how proteins fold – not just predicting their final structure – is a problem that DeepMind is yet to tackle.”

Oddly, the New Scientist reporter interviewed one of the same people — Matt Higgins at the University of Oxford — as a reporter at The Guardian who covered the same story.

NBC News — 7/10 (Link)

Good coverage, with plenty of nuance.

Instead of making blanket claims, the journalist (Denise Chow) relies solely on what was said by people at AlphaFold. This is a tried-and-true journalistic strategy. If you get something wrong, you can swiftly point fingers at the person who told you the false information, rather than take the blame yourself.

The subhead is accurate, too (italics mine): “DeepMind, an AI firm owned by Google’s parent company, Alphabet, said its program can now predict the structure of nearly every protein known to science.” The article also makes clear that the catalog encompasses data “from the sequenced genomes of almost every organism on the planet.”

MIT Technology Review — 8/10 (Link)

Really good coverage, with added insights not available from the press release. The journalist, Melissa Heikkilä, reported the announcement accurately, but also explained how the predictions may “not be as accurate for rarer proteins with less available evolutionary information,” because AlphaFold was trained on a very specific set of data that is deficient in structure-altering mutations.

Science — 6/10 (Link)

Brief, accurate, but lacking in original insights. Most of the reported information — apart from a brief mention that AlphaFold took “roughly 10 to 20 seconds to make each protein prediction” — seems to have been reported by at least three other news outlets.

The Verge — 4/10 (Link)

Misleading headline (“DeepMind found the structure of nearly every protein known to science”) and devoid of original reporting. Every quote is taken from a press release or rehashed from the AlphaFold blog.

Note to Self: Add unique insights to the things that I write. Go beyond the news, the hype, the commentary. Contribute something useful to the discussion.

Thanks for reading.

— Niko McCarty

Twitter: @NikoMcCarty