This month, the Evil Engineer advises a spy in search for an inconspicuous form of data storage.
Dear Evil Engineer,
I’m a double agent operating on behalf of [redacted]. After 15 years operating in a hostile state’s defence technology facility, I have finally been taken into enough confidence to seize a quantity of sensitive data, which I’ve copied to a 32GB thumb drive to return to my home country of [redacted]. However, I’ve been informed that counter-intelligence has been alerted to my activities and I’m not sure how best to bring this data back over the border – surely all my electronics will be searched.
I’ve heard that data can be stored in other forms, including in DNA. Could I store these defence secrets in my own DNA and thus carry them across the border undetected?
Yours,
A spy
Dear villain,
Thank you for your revealing letter. It is not for me to alert any authorities as to your activities – I’m just an evil engineer, helping those in need – so let’s jump straight to answering your question. You could indeed use DNA to carry 32GB of data over the border, although not in your own body.
DNA data storage is an exciting emerging technology which aims to use the exceptional natural data storage capability of DNA. It involves encoding binary data (0s and 1s) into sequences of nucleotide bases (A, C, G, T), then synthesising DNA with that sequence. This is stored until needed, when the DNA is sequenced, then decoded back into binary readable on a computer.
DNA data storage has been experimented with for many decades (notably, in 1988, the artist Joe Davis inserted a small piece of synthetic DNA containing a simple visual representation of the female genitalia into live E. coli cells – each organism contained many copies, which could be sequenced and decoded to reproduce the image). It really took off in the 2010s with a pair of landmark papers.
One, by Harvard academics, described encoding a 50,000-word book and other media in DNA, simply mapping bits one-to-one with bases. The other, by scientists at the European Bioinformatics Institute, demonstrated it was possible to store, retrieve and reproduce data from DNA with at least 99.99 per cent accuracy, thanks to an innovative error-correction scheme. These papers demonstrated it was indeed feasible to store substantial amounts of data in DNA and read it back.
Since then, larger and larger items have been stored in DNA, including the entirety of the text of the English-language Wikipedia and an episode of Netflix’s ‘Biohackers’. Other advances include performing data-processing operations directly on DNA with chemical processes, automating the process, and protecting data from errors such as by interspersing ‘synchronisation’ nucleotides to help with reconstruction.
This mode of data storage is attractive because it is unbelievably space-efficient – theoretically, it is possible to store an exabyte in the volume of a grain of sand. DNA is also good at keeping information for extremely long periods of time – thousands or even millions of years under the right conditions (in 2021, researchers successfully sequenced DNA from a 1.2-million-year-old mammoth). Although its practical use is currently limited by high costs and slow read and write times, progress is being made on these fronts and it is realistic to expect that we could see DNA-based and hybrid storage being used in the not-too-distant future for data that does not need to be accessed frequently.
So, that’s the promising state of the technology. Of course, you are interested in the possibility of storing information in your own body. The good news is that it is possible to store data in living organisms. In 2021, Columbia University researchers published a paper describing how they used CRISPR to store information in the active genes of E. coli – they encoded 72 bits in DNA to spell out ‘Hello world!’. Even after mixing them with normal soil microbes, they were still able to sequence the DNA and extract the message.
The bad news is that storing data in living organisms is far, far more complex than storing data in synthetic DNA in a frozen PCR tube. So far, it is very limited, with only short sequences being stored and no answer yet to the problem of maintaining stability over many generations, during which mutations rapidly degrade the information.
Now, it is possible to use a human genome such as your own as a template for storing data. If one bit can be encoded per DNA base, a genome with six billion base pairs could store a theoretical maximum six gigabits (0.75GB) of data – in practice, less, as a considerable fraction must be set aside for indexing and error correction. But although it is possible to use a human genome, there is no particular reason to choose it over that of any other organism. Unfortunately, you cannot insert a human genome-like thing loaded with military intelligence into your own body and expect it to function normally. DNA storage only works as well as it does when the DNA is kept under cool, dry conditions – i.e. utterly unlifelike conditions.
If it is possible to stow away a tube of DNA in your luggage and keep it cool during the length of the journey, that would not be an unfeasible way to approach your dilemma. If it is impossible to keep cool, dry and sterile, you would be best off considering the more conventional approaches – encrypted communication channels, steganography, or simply hiding a microSD card inside your own body.
Yours,
The Evil Engineer
Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.