PUMA
Istituto di Scienza e Tecnologie dell'Informazione     
Sebastiani F. Machine learning and automatic text classification: what's next?. In: The "Methods" Conference of the Association of Survey Computing (Winchester, UK, September 6-7 2013).
 
 
Abstract
(English)
Research in text classification (a.k.a. verbatim coding) mostly focuses on the design of software systems for classifying large amounts of uncoded data. Some involve a training phase, whereby a text classifier "learns" to code verbatims from manually coded examples. Scant attention has been given to designing software that supports what often come next: further human editing and even correction of the data to reduce classification errors. In this presentation I will present recent research aimed at optimizing the amount of human inspection effort needed to reduce the classification error down to a desired level. The fact that, for many applications, false positives and false negatives weigh differently on what one perceives "error" to be, calls for an approach to this task based on utility theory.
Subject Survey coding
Text classification
Utility theory
I.2.6 Learning


Icona documento 1) Download Document PDF


Icona documento Open access Icona documento Restricted Icona documento Private

 


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional