PUMA
Istituto di Scienza e Tecnologie dell'Informazione     
Nanni M., Silvestri F., Giannotti F., Pedreschi D. The Web Object Store: an infrastructure for mining semantics from web resources and their usage. The document will be submitted to Conference:, Technical report, 2005.
 
 
Abstract
(English)
The development of methods for an effective and efficient access to the information contained in large masses of digital documents is a long-standing objective in computer science research, and its importance is emphasized by the growing availability of large information repositories. With the advent of the web, the methods for content delivery evolved in the services offered by search engines, categorization and topic search services, related pages services, etc.: the main innovation needed was a shift from content-only analysis methods to the combined analysis of contents and hyperlinked structure of web documents, as witnessed by the PageRank metric for document relevance. However, as the web explosion continues, the limitations of the current generation of access services to web contents are becoming clearer, in terms of scarce quality and freshness of the results, etc. The overall vision presented in this paper is the development of a new generation of services for enhanced content delivery - web search, document classification, question answering, etc. - tailored for a large-scale community of web users, and based on the use of knowledge extraction methods for enriching raw data with automatically-extracted semantic information. We refer to such category of services as Usage-enhanced Web-Access services (UWA), emphasizing the fact that they are based on a combination of web usage, web content and web structure mining. Usage data are those that the community of web users decides to share, on a privacy-preserving basis, in a participatory style. Usage-enhanced Web-Access services (UWA) applications are complex, for several reasons. They deal with enormous volumes of data. They deal with continuously incoming streams of data. They deal with different abstractions of the data. They apply computationally expensive data mining algorithms on the data. The needed infrastructure for supporting the development of UWA applications is called, in our project, Web Object Store - WOS - a web data management system specialized in dealing with web content, structure and usage data. The WOS is designed to provide persistency, compression and efficient access methods for data structures representing basic web objects (Web documents, URIs, Citations, and HTTP requests), and to help the development of sophisticated applications that need complex data structures and advanced analysis methods.
Subject Data Mining, Web Usage Data
H.2.8 Database Applications
68U01 General


Icona documento 1) Download Document PDF


Icona documento Open access Icona documento Restricted Icona documento Private

 


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional