Istituto di Scienza e Tecnologie dell'Informazione     
Pensa R. G., Boulicaut J. Constrained co-clustering of gene expression data. In: SDM 2008 - The 2008 SIAM International Conference on Data Mining (Atlanta, GA, 24-26 Aprile 2008). Proceedings, pp. 25 - 36. SIAM, 2008.
In many applications, the expert interpretation of coclustering is easier than for mono-dimensional clustering. Co-clustering aims at computing a bi-partition that is a collection of co-clusters: each co-cluster is a group of objects associated to a group of attributes and these associations can support interpretations. Many constrained clustering algorithms have been proposed to exploit the domain knowledge and to improve partition relevancy in the mono-dimensional case (e.g., using the so-called must-link and cannot-link constraints). Here, we consider constrained co-clustering not only for extended must-link and cannot-link constraints (i.e., both objects and attributes can be involved), but also for interval constraints that enforce properties of co-clusters when considering ordered domains. We propose an iterative coclustering algorithm which exploits user-defined constraints while minimizing the sum-squared residues, i.e., an objective function introduced for gene expression data clustering by Cho et al. (2004). We illustrate the added value of our approach in two applications on gene expression data.
URL: http://www.siam.org/proceedings/datamining/2008/dm08.php
Subject co-clustering
gene expression
data analysis
H.2.8 Database Applications

Icona documento 1) Download Document PDF

Icona documento Open access Icona documento Restricted Icona documento Private


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional