PUMA
Istituto di Scienza e Tecnologie dell'Informazione     
Sassi M., Pardelli G., Biagioni S., Carlesi C., Goggi S. A Digital Archive of Research Papers in Computer Science. In: LREC'10 - Seventh International Conference on Language Resources and Evaluation (Valletta, Malta, 17-23 May 2010). Proceedings, pp. 1245 - 1248. Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias (eds.). European Language Resources Association ELRA, 2010.
 
 
Abstract
(English)
This paper presents the results of a terminological work conducted by the authors on a Digital Archives Net of the Italian National Research Council (CNR) in the field of Computer Science. In particular, the research tends to analyse the use of certain terms in Computer Science in order to verify their change over the time with the aim of retrieving from the net the very essence of documentation. Its main source is a reference corpus made up of 13,500 documents which collects the scientific productions of three CNR research Institutes. They are ISTI (Institute of Information Science and Technologies), IIT (Institute of Informatics and Telematics) and ILC (Institute of Computational Linguistics), all of them born from the "Centro Studi sulle Calcolatrici Elettroniche (CSCE)" and now belonging to the CNR Department of Information & Communication Technologies and Cultural Identity. This study is divided in three sections: 1) an introductory one dedicated to the data extracted from the scientific documentation: the data have in common the use of some terms proper of the Computer Science lexicon although these term belong to different branches (Linguistics, Informatics and Telematics); 2) the second section is devoted to the description of the contents managed by the PUMA (Publication Management System) system; 3) the third part contains a statistical representation of terms extracted from archive: some comparison tables between the occurrences of the most used terms in the scientific documentation produced by the three Institutes will be created and diagrams with percentages about the most frequently used terms will be displayed too. Lastly, indexes and concordances will allow to reflect on the use of certain terms in this field and give possible keys for having access to the extraction of knowledge in the digital era.
URL: http://www.lrec-conf.org/proceedings/lrec2010/index.html
Subject Digital libraries
Document Classification
Text categorisation
Text mining
I.2.7 Natural Language Processing. Text analysis


Icona documento 1) Download Document PDF


Icona documento Open access Icona documento Restricted Icona documento Private

 


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional