Siguna Mueller, Farhad Jafari and Don Roth
DNA storage of information is emerging as the next-generation approach to archiving vast amounts of data. Various sophisticated approaches for data storage in DNA have been proposed. Herein we present a multistep algorithm designed to detect and/or correct errors introduced at any stage of the DNA storage process, including those during message DNA generation, and propose refinements designed to ensure authenticity and correctness of each individual encoded DNA block. In addition, the algorithm allows authentic decoding without a reference sequence or message meaning. The algorithm is designed based on principles underlying provably secure cryptographic systems. Importantly, our new algorithm compares favorably with current ones in terms of ease of implementation and message expansion. In cases where reads are error-free, our algorithm should be faster than current alignment techniques. Without knowing the original data, a certificate is generated that confirms that the obtained data are exactly the same as the original. Our algorithm has applications to DNA steganography, sequence alignment, fast identification of correct reads in next generation sequencing and to message security.