Istituto di Scienza e Tecnologie dell'Informazione     
Lucchese C., Orlando S., Perego R., Silvestri C. Mining frequent closed itemsets from distributed repositories. Talia Domenico, Bilas Angelos, Dikaiakos Marios D (eds.). USA: Springer, 2007.
In this paper we address the problem of mining frequent closed itemsets in a highly distributed setting like a Grid. The extraction of frequent (closed) itemsets is an important problem in Data Mining, and is a very expensive phase needed to extract from a transactional database a reduced set of meaningful association rules, typically used for Market Basket Analysis. We figure out an environment where a transactional dataset is horizontally partitioned and stored in different sites. We assume that, due to the huge size of datasets and privacy concerns, dataset partitions cannot be moved to a centralized site where to materialize the whole dataset and perform the mining task. Thus it becomes mandatory to perform separate mining at each site, and then merge local results for deriving global knowledge. This paper shows how frequent closed itemsets, mined independently in each site, can be merged in order to derive globally frequent closed itemsets. Unfortunately, such merging might produce a superset of all the frequent closed itemsets, while the associated supports could be smaller than the exact ones because some globally frequent closed itemsets might be not locally frequent in some partition. In order to avoid an expensive post-processing phase, needed to compute exact global results, we employ a method to approximate the supports of closed itemsets. This approximation is only needed for those globally (closed) frequent itemsets which are locally infrequent on some dataset partitions, and thus are not returned at all from the corresponding sites.
URL: http://www.springer.com/computer/communications/book/978-0-387-37830-5
Subject Closed frequent itemsets
H.2.8 Database Applications

Icona documento 1) Download Document PDF

Icona documento Open access Icona documento Restricted Icona documento Private


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional