Istituto di Scienza e Tecnologie dell'Informazione     
Lucchese C., Perego R., Silvestri F., Orlando S. WebDocs: a real-life huge transactional dataset. In: ICDM Workshop on Frequent Itemset Mining Implementations (Brighton, UK, 1 November 2004). Proceedings, pp. 2 - 2. CEUR-WS.org, 2004.
This short note describes the main characteristics of WebDocs, a huge real-life transactional dataset we made publicly available to the Data Mining community through the FIMI repository. We built WebDocs from a spidered collection of web html documents. The whole collection contains about 1.7 millions documents, mainly written in English, and its size is about 5GB.
URL: http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-126/
Subject Frequent itemsets mining datasets
H.2.8 Database Applications

Icona documento 1) Download Document PDF

Icona documento Open access Icona documento Restricted Icona documento Private


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional