Istituto di Informatica e Telematica     
Manzini G., Rastero M. A Simple and Fast DNA Compression Algorithm. Software: Practice and Experience. In: Software: Practice and Experience, vol. 34 pp. 1397 - 1411. John Wiley & Sons, Ltd, 2004.
In this paper we consider the problem of DNA compression. It is well known that one of the main features of DNA sequences is that they contain substrings which are duplicated except for a few random mutations. For this reason most DNA compressors work by searching and encoding approximate repeats. We depart from this strategy by searching and encoding only exact repeats. However, we use an encoding designed to take advantage of the possible presence of approximate repeats. Our approach leads to an algorithm which is an order of magnitude faster than any other algorithm and achieves a compression ratio very close to the best DNA compressors. Another important feature of our algorithm is its small space occupancy which makes it possible to compress sequences hundreds of megabytes long, well beyond the range of any previous DNA compressor.
DOI: 10.1002/spe.619
Subject DNA sequences
data compression
space economical algorithms
approximate repeats encoding
I.1.2 Algorithms

Icona documento 1) Download Document PDF

Icona documento Open access Icona documento Restricted Icona documento Private


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional