Istituto di Scienza e Tecnologie dell'Informazione
 Lancia G., Pinotti M. C., Rizzi R. Haplotyping populations by pure parsinomy: complexity, exact and approximation algorithms. In: Informs Journal on Computing, vol. 16 (4) pp. 348 - 359. Special Issue on Computational Molecular Biology/Bioinformatics. INFORMS Institute for Operations Research and the Management Sciences (INFORMS), Linthicum, Maryland, USA, 2004.
 Abstract(English) In this paper we address the Pure Parsimony Haplotyping problem: Given a set of genotypes G, find a minimum cardinality set of haplotypes which explains G. A genotype g is an n-dimensional vector in {0, 1, 2} and a haplotype h is an n-dimensional binary vector. A set of haplotypes H is said to explain G if for every g 2 G there are h1, h2 2 H such that h1 h2 = g. The position-wise sum h1 h2 indicates the genotype which has a 2 in the positions where h1 and h2 disagree, and the same value as h1 and h2 where they agree. In this paper we show the APX-hardness of the problem even when the number of "2" symbols is at most 3 for every g 2 G. We then give two Integer Linear Programming formulations. From the first one (appeared also in [9]) we derive a 2k−1-approximation algorithm when the number of symbols "2" is at most k for every g 2 G. This formulation has an exponential number of variables and constraints. The second formulation is new, and has a polynomial number of variables and constraints. Finally, we give approximation algorithms not based on Linear Programming, such as a trivial p|G|-approximation algorithm for the general case, and an effective probabilistic algorithm with a performance guarantee of 2k+1blog |G|c (1 + dln |G|e) when the number of symbols "2" is at most k for every g 2 G. The expected running time of the algorithm turns out to be almost linear in the input size. 1 Introduction A Single Nucleotide Polymorphism (SNP) is a site of the human genome (i.e., the posi- tion of a specific nucleotide), whose content shows a statistically significant variability within a population. A position is considered a SNP if for the minority of the pop- ulation (as long as it consists of at least 5% of the population) a certain nucleotide URL: http://joc.journal.informs.org/content/16/4.toc DOI: 10.1287/ijoc.1040.0085 Subject Distributed Systems F.2 Analysis of Algorithms and Problem Complexity

Open access Restricted Private

 Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it