Can we store our data in DNA?

Save this storySave this storySave this storySave this story

A zettabyte is equivalent to a trillion gigabytes. That’s a huge number, but some forecasts suggest that humanity will produce eighty zettabytes of digital data this year. It all adds up: PowerPoint presentations and selfies; videos captured by cameras; electronic medical records; data collected from smart devices or telescopes and particle accelerators; backups and copies of backups. Where should it all go, how much of it should be stored, and for how long? These are the questions that plague the computer scientists who manage the world’s data stores. For them, the cloud is not just an abstraction but a real system that needs to be built, funded, and maintained.

Lede
Reports and opinions on what you need to know today.

Data storage experts discuss the temperature range of data. At one end is “hot” data, like Wikipedia or your bank balance, which needs to be available on your screen almost instantly. At the other end is “cold” data, which can take minutes or even days to access. “Warm” data, like your old photos, can be retrieved in seconds. Most data is cold, and much of it can probably be safely deleted. However, some may prove vital later on – in a criminal case, for example – and its potential value means it should be preserved indefinitely.

One of the most common cold storage media is magnetic tape. It was invented in the 1920s and has been continually improved, doubling its capacity every few years. Quantum, a leading maker of archival technology, offers tape libraries that resemble jukeboxes the size of shipping containers. Inside these libraries, a small robot retrieves data by finding tapes stored on VHS-style cassettes and plugging them into drives to read them. “There are thousands of Quantum robots in the cloud moving your data around,” Eric Bassier, who has worked at Quantum for more than 16 years, told me.

Tape use is growing every year, in part because of the needs of data warehouses like Google. But the amount of data humanity produces each year would fill thirty thousand shipping containers on today’s tapes. At the same time, tapes and drives deteriorate over time. Tape Ark, an Australian company that helps recover data from damaged tapes, has described the process of salvaging lunar dust samples brought back from the Apollo missions. He also showed me a video of an old tape disintegrating as it moved around inside a drive. “Those little black dots you see on the left side of the screen are Word documents and Excel spreadsheets that have fallen off the tape because it’s become so brittle,” he said.

Magnetic tape may seem like an old technology. However, some researchers looking for a replacement have begun to look to an even older medium. Billions of years ago, evolution chose DNA as its information storage medium. Translating the ones and zeros of a computer into the bases of genetic material (A, C, T, and G) may offer several advantages. First, DNA molecules can theoretically store up to a billion gigabytes per cubic millimeter, a density level that would allow the number of tapes equal to a shipping container to fit into the volume of a few sesame seeds. Second, properly prepared strands of DNA can be stored reliably for thousands of years: the oldest surviving DNA sample is two million years old and still readable. And finally, DNA does not age. The important thing is that in the life sciences—and in the functioning of our own bodies—we will likely always have the tools to read what we have written.

Soviet physicist Mikhail Samoilovich Neumann proposed using DNA to store data in 1964, about a decade after the double helix was first described by James Watson, Francis Crick, and Rosalind Franklin. However, creating a real DNA-based storage system proved challenging. First, scientists had to figure out how to mathematically encode the 0s and 1s in DNA bases. (There are many variations.) Then they had to manufacture strings of those bases on demand. Then they had to securely store, retrieve, and read those strings, and finally convert them back into bits. The first demonstration of the technology came in 1988, when artist Joe Davis created a stick figure called the Microvein. Davis used an encoding scheme to convert a five-by-seven-pixel image into a sequence of 18 bases. With the help of a Harvard lab, he inserted the DNA into E. coli bacteria, which could store and reproduce the message. The researchers were able to read it back two years later. In 2007, another group achieved similar success by encoding “E=mc^2 1905!” into the bacterial genome.

In 2010, biologist Craig Venter, who played a key role in sequencing the human genome

Sourse: newyorker.com

No votes yet.
Please wait...

Leave a Reply

Your email address will not be published. Required fields are marked *