PUMA
Istituto di Scienza e Tecnologie dell'Informazione     
Avancini H., Rauber A., Sebastiani F. Organizing Digital Libraries by Automated Text Categorization. 050 3152892, Technical report, 2002.
 
 
Abstract
(English)
Text Categorization (TC) is the discipline concerned with the construction of automatic text classifiers, i.e. programs capable of assigning to a document one or more among a set of predefined categories based on the content of the document. Building these classifiers is itself done automatically, by means of a general inductive process that learns the characteristics of the categories from a set of preclassified documents. In this paper we discuss a class of applications, automatic indexing with controlled vocabularies, that is of direct concern to organizing digital libraries. We exemplify this class of applications by discussing an ongoing project aimed at classifying scientific papers about computer science with respect to the ACM Classification Scheme.
Subject Text classification
Text categorization
Hierarchical classification
Clustering
Self-organizing maps
Shrinkage
H.3.1 Content Analysis and Indexing
H.3.7 Digital Libraries
I.2.6 Learning
Induction


Icona documento 1) Download Document PDF


Icona documento Open access Icona documento Restricted Icona documento Private

 


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional