PUMA
Istituto di Scienza e Tecnologie dell'Informazione     
Lucchese C., Orlando S., Perego R. Mining frequent closed itemsets out-of-core. In: SIAM International Conference on Data Mining (Bethesda, Maryland, 20-22 April 2006). Proceedings, pp. 417 - 427. SIAM, 2006.
 
 
Abstract
(English)
Extracting frequent itemsets is an important task in many data mining applications. When data are very large, it becomes mandatory to perform the mining task by using an external memory algorithm, but only a few of these algorithms have been proposed so far. Since also the result set of all the frequent itemsets is likely to be undesirably large, condensed representations, such as closed itemsets, have recently gained a lot of attention. In this paper we discuss the limitations of the partitioning techniques adopted by external memory algorithms for extracting all the frequent itemsets, when applied to closed itemsets mining. The main issue is that the closedness of an itemset cannot be evaluated only using the local knowledge available in a single partition of the input dataset. A further step is thus needed to correctly merge the partial results. We introduce the first algorithm for mining closed itemsets out of core. The algorithm exploits a divide-et-impera approach, where the input dataset is split into smaller partitions, such that not only they can be loaded, but also they can be mined entirely into the main memory. Moreover, we devised a simple technique based on a new theoretical result that allows us to reduce the problem of merging partial solutions to an external memory sorting problem.
URL: http://www.siam.org/meetings/sdm06/proceedings.htm
Subject Frequent itemsets mining
Out of core algorithms
H.2.8 Database Applications. Data mining


Icona documento 1) Download Document PDF


Icona documento Open access Icona documento Restricted Icona documento Private

 


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional