Muino J. M., Kuruoglu E. E., Arndt P. F. Evidence of a cancer type-specific distribution for consecutive somatic mutation distances. In: Computational Biology and Chemistry, vol. 53 (Part A) pp. 79 - 83. Special issue: Complexity in Genomes. Yannis Almirantis, Peter Arndt, Wentian Li, Astero Provata (eds.). Elsevier, 2014.
Specific molecular mechanisms may affect the pattern of mutation in particular regions, and therefore leaving a footprint or signature in the DNA of their activity. The common approach to identify these signatures is studying the frequency of substitutions. However, such an analysis ignores the important spatial information, which is important with regards to the mutation occurrence statistics. In this work, we propose that the study of the distribution of distances between consecutive mutations along the DNA molecule can provide information about the types of somatic mutational processes. In particular, we have found that specific cancer types show a power-law in interoccurrence distances, instead of the expected exponential distribution dictated with the Poisson assumption commonly made in the literature. Cancer genomes exhibiting power-law interoccurrence distances were enriched in cancer types where the main mutational process is described to be the activity of the APOBEC protein family, which produces a particular pattern of mutations called Kataegis. Therefore, the observation of a power-law in interoccurence distances could be used to identify cancer genomes with Kataegis.
