据C4ISRNET网2020年1月16日报道,美国情报高级研究计划局(IARPA)近日正式启动“分子信息存储”(MIST)项目,旨在利用合成DNA存储EB级(百万兆字节)数据,开发能同时向合成DNA介质写入数据和从中读取数据的新型设备,将EB级数据存储系统缩减到桌面尺寸,大幅降低运营和维护成本,目标是在3~5年内实现商用。为将数据存储在合成DNA上,必须首先将数字文件从二进制代码转换为DNA格式,然后将DNA合成为短片段,通过一种条形码识别该片段相对于整个序列所在位置,读取或复制数据。
Where to store all this intelligence data? How about DNA?
Data is a massive problem for the intelligence community. From the satellite images produced by the National Reconnaissance Office to the bulk communications data swept up by the National Security Agency, the intelligence community is collecting more information than ever before. But where to store it?
Data centers are massive warehouses and megawatts of power. The resource-intensive nature of these facilities makes them difficult to scale, and ultimately unprepared for a torrent of data.
Now, the Intelligence Advanced Research Projects Activity—the organization charged with tackling some of the intelligence community’s most difficult problems—thinks it has a solution: synthetic DNA. On Jan. 15, IARPA officially launched the Molecular Information Storage (MIST) program, an effort to use synthetic DNA to store exabytes (one million terabytes) of data.
A potentially $824 million contract will help the National Geospatial-Intelligence Agency transition from its legacy systems.
IARPA has awarded multi-phase contracts to two teams pursuing a solution: up to $23 million for the Molecular Encoding Consortium and up to $25 million for Georgia Tech Research Institute.
“The MIST program is a data storage moonshot to develop technologies that allow us to shrink an exabyte-scale data warehouse down to a tabletop form factor, with equally large reductions in operation and maintenance costs,” said IARPA Program Manager David Markowitz. “This would be a transformative capability for big data stakeholders in government and industry.”
If successful, MIST will result in new devices capable of both writing data to and reading data from synthetic DNA media at scale. The goal is to make this technology commercially viable within three to five years.
To store data on synthetic DNA, digital files must first be converted from binary code to the format used for DNA—sequences of A,C,T and G. DNA is then synthesized in short segments that are identified by a sort of barcode that identifies where that segment exists in relation to the entire sequence, allowing readers to recall or copy only the necessary data.
Hmm. The military’s issues with providing a single battlefield picture sure sound familiar …
“With digital data growing at an exponential rate, there is increasing interest and excitement about using nature’s storage medium, DNA, to store digital data,” said Emily Leproust, CEO and co-founder of Twist Bioscience, one of the companies working with the Georgia Tech Research Institute. “With the government’s commitment to fund this exciting new area of storage, we believe that as part of this consortium of specialists, we can truly revolutionize the DNA synthesis process, and reduce the cost of synthesis for DNA data storage by many orders of magnitude.”
According to Twist Bioscience, the goal of their effort will be to create a device capable of writing enough synthetic DNA per day to reduce data storage costs to as low as $1 per gigabyte.
“Fifty years ago, DNA data storage was considered science fiction – today, it is science with a path toward broad implementation,” Leproust said. “We expect in the next three to five years, with the proper amount of government and industry investment, it will become a reality for long-term storage.”
The Molecular Encoding Consortium is led by the Broad Institute of MIT and Harvard and includes DNA Script and Professor Donhee Ham’s research group at Harvard University.
Meanwhile, the Georgia Tech Research Institute is teaming with the Twist Bioscience Corp., the University of Washington, Microsoft and Roswell Biotechnologies for their effort. These new systems will be tested independently by Los Alamos National Laboratory, Sandia National Laboratories and the U.S. Army Research Laboratory .