Jesse Bloom, PhD, a computational biologist at the Fred Hutchinson Cancer Research Center in Seattle, posted about his recovery of “deleted deep sequencing data” on the bioRxiv preprint server on Tuesday. The paper hasn’t yet been peer-reviewed or published in a journal.
Bloom said the data “sheds more light on the early Wuhan SARS-CoV-2 epidemic,” though the scientific significance of his report remains unclear. Scientists expressed both favorable and unfavorable opinions on Wednesday, as well as what the data could mean for the initial outbreak in China.
“I recognize this is a hot-button topic,” Bloom told the Post. “It’s not a highly traditional scientific study, but at least it has some new data and new information.”
Bloom recovered deleted files from the Google Cloud that had been stored in the National Library of Medicine’s Sequence Read Archive. He then reconstructed partial sequences of 13 early epidemic viruses. Based on an analysis of the sequences, Bloom said the Huanan Seafood Market sequences that are the focus of the joint World Health Organization-China report on the origin of the outbreak don’t fully represent the viruses that were in Wuhan at the beginning of the epidemic.
Instead, Bloom wrote, the initial sequences likely had three key mutations that are similar to coronavirus relatives in bats. He doesn’t think the recovered data explains the origins of the coronavirus, but he believes that the virus was circulating in Wuhan before December.
“This study provides no evidence either way,” he told the newspaper. “But it does indicate that we probably have not exhausted all relevant data.”
The National Institutes of Health confirmed that the raw data had been deleted from the database, the Post reported. The information was included in a preprint paper that Chinese scientists posted in March 2020 and later published in the journal Small in June.
In a statement on Wednesday, the NIH said that a researcher who originally published the data asked for the information to be removed from the NIH database so it could be included in a different database. The NIH also said it is standard practice to remove data if requested.
“These SARS-CoV-2 sequences were submitted for posting in [the Sequence Read Archive] in March 2020 and subsequently requested to be withdrawn by the submitting investigator in June 2020,” the NIH wrote in a statement.
“The requestor indicated the sequence information had been updated, was being submitted to another database, and wanted the data removed from SRA to avoid version control issues,” the NIH wrote.
Other submissions have been removed from databases since the beginning of the pandemic. The NIH reported that the National Library of Medicine had identified eight instances when researchers had withdrawn submissions to the library. That includes the data retrieved by Bloom and “the rest from submitters predominantly in the U.S.,” the newspaper reported.
Scientists who have been studying the origins of the coronavirus shared differing opinions online on Wednesday and with numerous news outlets, including reports in The New York Times and Science. Some said the data was “nothing new” and that it could still be found in scientific literature. Others said the early data needs to be better preserved and shared.
Either way, Bloom’s paper appears to be adding fuel to the discussion about the origins of the pandemic. Last month, President Joe Biden ordered intelligence agencies to conduct a 90-day review and provide a report. In an interview with Yahoo! News this week, Avril Haines, the director of national intelligence, said it’s possible that the answer will never be known.
“The best thing I can do is present the facts as we know them and to present the analysis that we’ve done in as unbiased a way possible,” she said.