PUMA
Istituto di Scienza e Tecnologie dell'Informazione     
Galavotti L., Sebastiani F., Simi M. Experiments on the use of feature selection and negative evidence in automated text categorization. Technical report, 2000.
 
 
Abstract
(English)
In this work we tackle two different problems of text categorization (TC), namely feature selection and c1assifier induction. Feature selection refers to the activity of selecting, from the set of r distinct features (i.e. words) occurring in the collection, the subset of r' r features that are most useful for compactly representing the meaning of the documents. We propose a novel feature selection technique, based on a simplified variant of the χ2 statistics. Classifier induction refers instead to the problem of automatically building a text c1assifier by learning from a set of documents pre-c1assified under the categories of interest. We propose a novel variant, based on the exploitation of negative evidence, of the well-known k-NN method. We report the results of systematic experimentation of these two methods performed on the standard REUTERS-21578 benchmark.
Subject Machine learning
Text categorisation
Text classification
H.3.3 Information filtering
H.3.3 Performance evaluation (efficiency and effectiveness)
I.2.3 Induction


Icona documento 1) Download Document PDF


Icona documento Open access Icona documento Restricted Icona documento Private

 


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional