Istituto di Scienza e Tecnologie dell'Informazione     
Perego R., Orlando S., Lucchese C. Mining Frequent Closed Itemsets without Duplicates Generation. The document has been submitted to Conference: ACM KDD 2004, Technical report, 2004.
Closed itemsets are semantically equivalent to frequent itemsets but are orders of magnitude fewer, thus allowing the knowledge extracted from a transactional database to be represented very concisely. Unfortunately, no algorithm has been yet devised which allows to mine closed patterns directly. All existing algorithms may in fact generate the same closed itemset multiple times, and need to maintain the closed itemsets mined so far in the main memory in order to check if the current element has been already discovered or not. In this paper we propose a general technique for promptly detecting and discarding duplicate closed itemsets without the need of keeping in the main memory the whole set of closed patterns. Our approach can be exploited with substantial performance benefits by any depth first algorithms for mining frequent closed itemsets. As a case of study, we implemented our technique within a new mining algorithm which adopts a vertical bitmap representation of the dataset. The experimental evaluation demonstrates that our algorithm outperforms other state of the art algorithms such as CHARM, CLOSET+ and FPCLOSE.
Subject Frequent closed itemsets, association rules
H.2.8 Database Management, Database applications, Data Mining

Icona documento 1) Download Document PDF

Icona documento Open access Icona documento Restricted Icona documento Private


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional