SIXTH FRAMEWORK PROGRAMME

INFORMATION SOCIETY TECHNOLOGIES

 

 

 

DRIVER

 

IST-FP6-034047

 

“Digital Repository Infrastructure Vision for European Research”

 

 

 

 

 

 

 

Architectural Specification

 

Deliverable code: DRIVER-03-D2.0-9.5

 

 

 

 

 

 

 

 

 

 

 

Project

Title:

DRIVER, Digital Repository Infrastructure Vision for European Research

Start date:

1st June 2006

Call/Instrument:

FP6-IST-2.5.6.3, Research Network Testbeds

Contract Number:

IST-034037

 

 

Document

Deliverable number:

DRIVER-03-D2.0-9.5

Deliverable title:

Architectural Specification Report

Contractual Date of Delivery:

1st October 2006

Actual Date of Delivery:

Nov 15, 2006

Editor(s):

CNR – ISTI

Author(s):

Donatella Castelli, Paolo Manghi, Pasquale Pagano, Leonardo Candela, Natalia Manola, Vassilis Stoumpos, Friedrich Summann, Marek Imialek, Jaroslaw Wypychowski

Reviewer(s):

 

Participant(s):

CNR – ISTI

Workpackage:

WP3

Workpackage title:

DRIVER Design and Technical Coordination

Workpackage leader:

CNR - ISTI

Workpackage participants:

CNR, UoA, SURF, CNRS-CCSD, UKOLN, and UN-W

Distribution:

Public

Nature:

Report

Version/Revision:

4.6

Draft/Final:

Final

File name:

DRIVER Architectural Specification V4.2.doc

This document contains description of the DRIVER project findings, work and products. Certain parts of it might be under partner Intellectual Property Right (IPR) rules so, prior to using its content please contact the consortium head for approval.

In case you believe that this document harms in any way IPR held by you as a person or as a representative of an entity, please do notify us immediately.

The authors of this document have taken any available measure in order for its content to be accurate, consistent and lawful. However, neither the project consortium as a whole nor the individual partners that implicitly or explicitly participated the creation and publication of this document hold any sort of responsibility that might occur as a result of using its content.

This publication has been produced with the assistance of the European Union. The content of this publication is the sole responsibility of DRIVER consortium and can in no way be taken to reflect the views of the European Union.

DRIVER is a project funded by the European Union

Disclaimer.................................................................................................... 3

Table of Contents........................................................................................ 4

Table of Figures........................................................................................... 7

Summary...................................................................................................... 8

1     Introduction........................................................................................... 9

1.1      Purpose of this document........................................................................ 9

1.2      Content of this document........................................................................ 9

1.3      Document Outline................................................................................. 10

2     Architecture Overview......................................................................... 11

2.1      Preliminary assumptions......................................................................... 11

2.1.1      Service-Oriented Architecture............................................................ 11

2.1.2      Exploitation of existing assets............................................................ 12

2.2      DRIVER Information Space..................................................................... 12

2.2.1      DRIVER Object Model...................................................................... 13

2.2.1.1       C-Object summaries.................................................................. 14

2.2.1.2       DRIVER Metadata Format.......................................................... 14

2.2.2      DRIVER Information Space Consumers: users and applications................. 16

2.3      DRIVER Information Space for OAI-Item C-Objects..................................... 16

2.3.1      Object Model Implementation........................................................... 18

2.3.1.1       Repository Service Harvesting and Aggregation............................... 18

2.3.1.2       Physical Design of DRIVER Objects............................................... 18

2.3.1.3       Aggregation process: cleaning and enriching.................................... 19

2.3.1.4       Summary and scenario.............................................................. 20

2.3.2      DRIVER Objects for OAI-Items: Index Services and OAI-Publisher Services 21

2.4      DRIVER Services.................................................................................. 22

2.5      DRIVER Resources............................................................................... 25

2.5.1      Resource Management and Resource Profiles....................................... 26

2.5.2      Resource Model............................................................................. 26

2.5.2.1       Service Resources..................................................................... 27

2.5.2.2       Data Structure Resources........................................................... 30

2.5.3      Resource Profiles............................................................................ 33

2.5.3.1       Service Resource profiles............................................................ 33

2.5.3.2       Data Structure Resource Profiles.................................................. 37

3     The DRIVER Architecture.................................................................... 40

3.1      Data Provider Services and Result Set Services........................................... 40

3.1.1      ResultSet Resources: creation and access............................................ 41

3.2      System Management............................................................................ 44

3.2.1      Information Service and Manager Service: Resource Management............ 44

3.2.2      Information Service......................................................................... 47

3.2.2.1       Service architecture................................................................... 48

3.2.2.2       Interactions............................................................................. 58

3.2.2.3       Detailed design......................................................................... 58

3.2.3      Manager Service............................................................................ 59

3.2.3.1       Service architecture................................................................... 61

3.2.3.2       Interactions............................................................................. 64

3.2.3.3       Detailed design......................................................................... 64

3.2.4      Authentication & Authorization Service................................................. 64

3.2.4.1       Service Architecture................................................................... 68

3.2.4.2       Interactions............................................................................. 75

3.2.4.3       Detailed design......................................................................... 75

3.3      Collective Layer.................................................................................... 76

3.3.1      Repository Service.......................................................................... 76

3.3.2      Aggregator Service......................................................................... 77

3.3.2.1       Service architecture................................................................... 78

3.3.2.2       Interactions............................................................................. 81

3.3.2.3       Detailed design......................................................................... 82

3.4      Information Space Management.............................................................. 82

3.4.1      Metadata Storage (MDStore) Service.................................................. 82

3.4.1.1       Service architecture................................................................... 83

3.4.1.2       Interactions............................................................................. 86

3.4.1.3       Detailed design......................................................................... 86

3.4.2      Index Service................................................................................ 86

3.4.2.1       Service Architecture................................................................... 87

3.4.2.2       Interactions............................................................................. 89

3.4.2.3       Detailed design......................................................................... 90

3.4.3      Collection Service............................................................................ 90

3.4.3.1       Service architecture................................................................... 91

3.4.3.2       Interactions............................................................................. 92

3.4.3.3       Detailed design......................................................................... 93

3.4.4      Search Service............................................................................... 93

3.4.4.1       Service Architecture................................................................... 95

3.4.4.2       Interactions........................................................................... 103

3.4.4.3       Detailed design....................................................................... 104

3.5      User Management.............................................................................. 104

3.5.1      Users-Communities Service............................................................. 104

3.5.1.1       Service architecture................................................................. 105

3.5.1.2       Interactions........................................................................... 107

3.5.1.3       Detailed design....................................................................... 107

3.5.2      Profiling Service............................................................................. 108

3.5.2.1       Interactions........................................................................... 112

3.5.2.2       Detailed design....................................................................... 112

3.5.3      Recommendation Service................................................................ 112

3.5.3.1       Service architecture................................................................. 114

3.5.3.2       Interactions........................................................................... 116

3.5.3.3       Detailed design....................................................................... 116

3.6      Presentation Layer.............................................................................. 117

3.6.1      User Interface Service................................................................... 117

3.6.1.1       Service architecture................................................................. 118

3.6.1.2       Interactions........................................................................... 123

3.6.1.3       Detailed design....................................................................... 123

3.6.2      OAI-Publisher Service..................................................................... 123

3.6.2.1       Service architecture................................................................. 125

3.6.2.2       Interactions........................................................................... 128

3.6.2.3       Detailed design....................................................................... 128

4     Conclusion.......................................................................................... 129

Acronyms................................................................................................. 130

References............................................................................................... 131

 

Figure 1 – DRIVER Object Model......................................................................... 13

Figure 2 DRIVER Object Model: OAI-Items interpretation, DMF and summaries............. 17

Figure 3 – DRIVER Object Model Implementation.................................................... 21

Figure 4. The DRIVER framework........................................................................ 23

Figure 5 – Example of EPR................................................................................. 25

Figure 6 The DRIVER Resource Model: Service Resources......................................... 27

Figure 7. The DRIVER Resource Model: Data Structure Resources.............................. 30

Figure 8 – Pull-Mode ResultSet............................................................................ 42

Figure 9 – Pull-Mode ResultSet for MDStore Services................................................ 42

Figure 10 – Push-Mode ResultSet......................................................................... 42

Figure 11. Generic Service interactions.................................................................. 46

Figure 12. System Management Services interaction................................................ 47

Figure 13 – Notification dispatching mechanisms...................................................... 49

Figure 14. Information Service – Detailed specification.............................................. 59

Figure 15. Manager Service Orchestration protocol.................................................. 60

Figure 16 - Resource AA state machine................................................................. 66

Figure 17 - Resource security profile data model..................................................... 67

Figure 18 - XACML policy data model.................................................................... 68

Figure 19 - XACML request activity....................................................................... 69

Figure 20 - AAS architecture............................................................................... 71

Figure 21 - Authentication sequence..................................................................... 73

Figure 22 - Authorization sequence....................................................................... 74

Figure 23 - Context ID chain sequence................................................................. 76

Figure 24 – Aggregator Service........................................................................... 78

Figure 25 – MDStore Service............................................................................... 83

Figure 26 – Index Service.................................................................................. 87

Figure 27. The DRIVER Collection Service.............................................................. 91

Figure 28. Overall Search Service Architecture........................................................ 95

Figure 29  - Detailed Architecture of the SS-QueryPreProcess component..................... 98

Figure 30 - Example of a query tree................................................................... 102

Figure 31. - Detailed Architecture of the SS-Execute component............................... 102

Figure 32. The DRIVER Users-Communities Service............................................... 105

Figure 33. The DRIVER Profiling Service............................................................... 109

Figure 34 - The DRIVER Recommendation Service................................................. 114

Figure 35. The DRIVER User Interface Service..................................................... 118

Figure 36 – OAI-Publisher Service....................................................................... 124

 

The aim of this deliverable is to create a global vision of the DRIVER system. The document focuses on the description and communication among the different components designed by the technological WP partners and aims at driving and coordinating system development.

1.1      Purpose of this document

DRIVER system aims at offering European Repository Infrastructure functionalities as proposed in the TA document and described in the D3.1 - Functional Specification document [1]. The complexity of the system requires its development process to rely on robust methodologies, based on the definition of this Software Architectural Specification document.

The target audience of this document includes software designers responsible for developing DRIVER services as well as anyone interested in understanding the high-level software structure of the system.

1.2      Content of this document

In DRIVER, software design and development processes are inspired by the Unified Process guidelines. Specifically, the Architectural Specification describes the system architecture as a compound of components interacting through interfaces. Each component is in turn described as a set of smaller components and interfaces, in charge of providing a description of the internal services architecture from a functional and logical point of view. In particular, the Unified Modeling Language (UML) is used to formalize the various elements of the specification and their relationships.

In particular, this document defines:

  • concepts of general interest to the whole system implementation:
    • Service-Oriented Architecture: technology and protocols;
    • Information Space and Object Model;
  • System Resources:
    • Service Resources, designed as software Services (meta-services) exposing functionality as Web Services, and Data Structure Resources, designed as data sources local to a specific Service;
    • system architecture organized into sets of Services, each set delivering the functionality described by one functional area identified in [1];
  • System management:
    • Users and communities;
    • Resource profiles and Resource management.  

The system architecture (see Section 3) is presented according to the structure identified by the functional areas defined in [1]. In particular, for each functional area, an UML architectural view of the services involved is provided. For each service, the relative architectural view describes:

  1. generic architectural specification: guidelines to the realization of the Service components, in terms of API methods, methods description, and protocols; it includes:

(i)                 description of the components providing interfaces to access the service functionality;

(ii)                description of the service interaction with the rest of the system, i.e. those components used by the other services to interact with the service black box;

  1. detailed service design: directions taken by DRIVER partners to the implementation of the Service Architectural Specification; it includes

(i)                 description of the internal structure of the service components, which emphasize details on their level of distribution and replication;

(ii)                technical assumptions, such as development platform or implementation specific solutions.

1.3      Document Outline

This document is structured as follows. Section 2 shows an overview of DRIVER Services organization, and presents the resource model on which DRIVER’s architecture relies on. Section 3 presents an integrated and comprehensive view of the DRIVER system architecture. Finally, section 4 concludes this deliverable.

The main goal of the DRIVER project is to build a European Open Service Repository Infrastructure. The infrastructure must be scalable and flexible with respect to the number and type of participating repositories, capable of offering different views to different communities of users, extendible with new services, and dynamically configurable in order to maximize Quality-of-Service.

As mentioned above, the outline of this document is inspired by the Unified Process guidelines, according to which the architecture of a complex software system is given in terms of system components interacting with each other through interfaces. In particular, the drafting of such specification builds on the set of both technical and functional preliminary assumptions pointed out first by the TA and then by the Functional Specification document [1]. In what follows we present a high-level view of the architecture, by:

  1. formalizing such assumptions;
  2. defining DRIVER Information Space: the System supports an Information Space, which hides to end-users the heterogeneity of the aggregated Repository Service Resources;
  3. identifying the Services to be designed and implemented in DRIVER;
  4. identifying software design of Resources and the design of Resource Management in terms of Resource profiles.

2.1      Preliminary assumptions

The following aspects surfaced drafting the TA and the Functional Specification [1] document, should be taken into account defining the Architectural Specification:

·         the System is based on a Service Oriented Architecture (SOA) implemented by Web Service Technology;

·         the architecture employs the use of industry standard interfaces, where available, for communication among the software components. This supports the greatest possible interoperability with third party software and openness to future services;

·         the system must reuse existing assets[1] as much as possible.

Next, we show how DRIVER technological and functional assumptions impacted on the software architecture definition.

2.1.1      Service-Oriented Architecture

The Service-oriented architecture approach is a way for building distributed systems that deliver application functionality as services to either end-user applications or other services. In particular, Web Services (WS) technology [3] offers a natural, interoperable mean to expose software functionality as a service to build an SOA. It provides a distributed computing approach for integrating heterogeneous applications over Internet making use of open technologies such as XML[2], SOAP[3], REST, UDDI[4], and WSDL[5].

DRIVER system will be designed by relying on the Web Service technology. This decision, of course, strongly impacts on our technological solution. The main consequence is that the DRIVER nodes forming the infrastructure must be equipped with a service container capable to host and support Web services. However, in order to facilitate the reuse of partner’s expertise and software, the project does not impose to the WPs a specific Web Service hosting environment implementation, but only the use of WSDL and SOAP/REST as the communication standards.

2.1.2      Exploitation of existing assets

Technological partners contribute to DRIVER system with their expertise and products. Specifically:

  • CNR brings into the project experience in the field of SOA design and implementation, grown into the OpenDLib and DILIGENT projects;
  • UoA offers experience in the field of user profiling, recommendation  mechanisms, and user interfaces;
  • ICM delivers authorization and authentication mechanisms to rule safe service communication and authenticated user access;
  • UniBi is in charge of the development of Data Management and Data Provision Services, having strong experience in the field of harvesting and indexing in the context of Bielefeld Academic Search Engine (BASE)[6]. UniBi activity will be supported in experience and practice by SURF, which will specifically contribute with SAHARA harvester-aggregator tool, developed under the DARE project.

2.2      DRIVER Information Space

DRIVER System aims at constructing a European Repository Infrastructure. In practical terms, the System allows existing Repositories, matching DRIVER Guidelines and Requirements, to deliver their content to larger communities of end-users through personalized services and interfaces.

Such content is heterogeneous, both in the nature, i.e. semantics, and in the structure it conforms to, i.e. the Repository Object Model of reference. For example, Repositories may contain the outcome of scientific research in any field, raw data, software, satellite pictures, tutorials, multimedia, and may conform to various Object Models, including OAI-Items (descriptive models), DIDL (complex objects), etc.

DRIVER Repository Infrastructure aims at collecting such heterogeneous content and aggregating it to form a uniform Information Space, which delivers the original data sources through the same interpretation, i.e. Object Model. The model describes DRIVER Objects as representations and “place holders” for objects collected from different Repositories, by providing a way to uniformly describe their properties of “collected objects”.

In practical terms, DRIVER Aggregator Services are in charge of collecting objects from different data sources, in the following C-Objects, and derive the corresponding DRIVER Objects. Accordingly, Aggregator Services must embed the interpretation of a C-Object within DRIVER Information Space that is they must be tailored to collect content from Repositories conforming to specific Object Models.

DRIVER first Public release includes the definition of DRIVER Object model, but focuses on OAI-Item Repositories only, by providing a specific Aggregator Service. More specifically, the first DRIVER Compliant Repositories must publish their content through the OAI-PMH protocol and contain toll-free, reachable textual files. DRIVER System provides the Services to aggregate, i.e. harvest, clean and enrich OAI-PMH metadata records from several Repositories, and support end-users with uniform search interfaces to the heterogeneous content. In particular, for each Repository, DRIVER aims at aggregating the OAI-Items it contains, hence all metadata records relative to the OAI-Items therein.

In the following the DRIVER Object Model is presented, together with its specialization to the OAI-Item Repository Model. Moreover, the technological solution adopted to support such Model for the OAI-Aggregator Service is illustrated.

2.2.1      DRIVER Object Model

DRIVER Object Model was specifically devised to support a semantically and structurally uniform Information Space populated by “objects collected from external heterogeneous Repositories”. According to the model, any object hosted by a digital Repository, namely a C-Object, can be collected and imported into DRIVER Information Space, to be encapsulated into a corresponding DRIVER Object. DRIVER Objects are capable of uniformly describing collected C-Objects, independently from their original Object Model, in terms of summaries of their content, structure, description, and provenance. C-Objects can be OAI-Items, Fedora Objects, DSpace Objects, objects into proprietary Repository technology, in general any items whose content might be of interest of a user search for subsequent access. Aggregator Services will be responsible for collecting/harvesting C-Objects for a specific Object Model and create a corresponding DRIVER Object.

Figure 1 – DRIVER Object Model

Figure 1 illustrates a DRIVER Object, which contains:

  • a unique identifier for the Object, which identifies the Object within the Information Space;
  • information relative to the C-Object provenance history: interesting if the C-Object reached its current Repository by a sequence of transfers from its original Repository, i.e. the Repository where it was first created;
  • date of collection of the C-Object, i.e. the date in which the C-Object was collected from its Repository; it coincides with the DRIVER Object creation date;
  • C-Object summaries, which are synthesis of the object description, structure, and content: these depend on the nature of the C-Object, thus on the Object Model adopted in the original Repository;
  • a DRIVER Metadata Format Record (DMF): a metadata record specific to all DRIVER Objects that provides a uniform description of the encapsulated C-Objects; the values within DMF fields are derived, i.e. extracted, from the C-Object summaries and provenance history.

DMF records provide a uniform description for heterogeneous C-Objects, while C-Objects summaries allow for direct access to the C-Object and its components at the original location of collection. C-Object summaries are obtained at collection time through specific Aggregator Service technology, tailored to the C-Object Object Model.[7]

C-Object access is to be provided by specific Services, which are out of the scope of the DRIVER test-bed. As previously mentioned, DRIVER Public test-bed will provide an Aggregator Service for OAI-Items; other Aggregator Services, specific to more complex Object Models, are left out of the current project aims. In the next section, summaries for OAI-Item C-Objects will be specified, together with the relative Aggregator Service technology.

2.2.1.1     C-Object summaries

In order to (i) limit the storage space required to maintain DRIVER Objects and (ii) confine the synchronization issues that may derive from replicating C-Objects, the former do not contain materialization of the latter but rather summaries of their content and structure. Summaries are succinct descriptions of the original C-Objects, tailored to allow the definition of Services for C-Object access and reuse. Structure is given according to an Object Model such as DIDL, generic enough to represent the structural primitives of any Object Model, while content will be referred to from its original location (e.g. through URIs, DOIs), and possibly summarized when beneficial -- for example, paper abstracts for presentation, or text keywords for search reasons.

Ideally, end users and/or external applications search over a uniform Information Space for DRIVER Objects relative to C-Objects satisfying some properties, specified in terms of DMF fields; once the C-Objects are identified, the requesters may access the original C-Objects through their summaries.

In this sense, DRIVER test-bed puts the foundation to C-Object management and leaves its practical solution to future System extensions.

2.2.1.2     DRIVER Metadata Format

DMF records are obtained by combining two sets of fields:

  • DRIVER Descriptive Metadata Format (DDF): DDF fields summarises the description, structure, and content of the C-Object; such summary depends on the Object Model to which the C-Object obeys and is extracted from the C-Object at collection time;
  • DRIVER Provenance Metadata Format (DPF): DPF fields describe the Repository from which the C-Object was collected.
DRIVER Descriptive Metadata Format

DDF fields aims at providing a simple structured and uniform description of a C-Object, in terms of its content, and structure, and description. For each of this aspects specific metadata fields are envisaged:s

·         C-Object Content

o        dr:CobjContentSynthesis: the field contains a summary of the content, for example keywords in the case of textual content, histograms in the case of pictures, or combinations of these elements in the case of complex objects, etc;

o        dr:CobjTypology: the field specifies the type of media embedded into the C-Object; possible values are taken from a controlled dictionary, e.g. picture, text, video, audio, software, compound, or others;

o        dr:CobjCategory: the fields specifies the specific category of the media typology; examples are thesis, article, book for text, or drawing, photo, gif for picture, etc.; possible values are taken from a controlled dictionary;

·         C-Object Structure

o        dr:CobjIdentifier: the field contains the unique identifier of the C-Object, if it exists, in the original Repository; examples are DOI, OAI-Item identifiers, etc.

o        dr:CobjModel: the field specifies the type of harvested object OAI, ORE, Fedora, others; possible values are taken from a controlled dictionary;

·         C-Object description

o        In DRIVER, the C-Object is described by a DC Application profile:

    • dc:title: title of the resource represented by the object;
    • dc:creator: author of the resource represented by the object;
    • dc:contributor
    • dc:publisher: the publisher, if it exists, of the C-Object;
    • dc:subject: a category characterizing the physical or digital Resource represented by the C-Object; values are taken from a controlled dictionary, such as DEWEY classification;
    • dc:identifier: reference URI to part of the digital resource, if any, relative to the harvested record;
    • dc:language: the language characterizing the resource represented by the object;
    • dc:dateAccepted, in DC refined: date of acceptance of the resource represented by the object; the standard to be adopted for dates is under decision;
    • dc:source (not to be indexed)
    • dc:relation (not to be indexed)

o        dr:CobjMDFormats: the field specifies the Metadata formats used to describe the C-Object, if any is available; values are taken from a controlled dictionary, and examples are Dublin Core, MARC, others;

o        dr:CobjDescriptionSynthesis: the field contains the synthesis of C-Object descriptions, that is keywords contained into the available metadata records used to describe the C-Object.

DRIVER Provenance Metadata Format

The provenance of content has a crucial role in DRIVER and should be explicitly part of the accessible information. Indeed, DRIVER users and applications are also interested in accessing and querying DRIVER objects according to the original harvesting environments. To this aim, DPF fields specify properties of the C-Object original environment.

  • dr:repository: the name of the Repository Service Resource from which the C-Object was collected;
  • dr:repositoryLink: a link to an informative or search web page relative to the Repository;
  • dr:country: the name of the country of the institution responsible of the repository; values are taken from a controlled dictionary;
  • dr:institution: the name of the institution responsible of the harvested metadata record.
DRIVER Metadata Format: record headers

DRIVER Metadata records do have headers specifying important information, such as:

  • The C-Object provenance history: in some cases the C-Objects were not originally created in the Repository from which they are collected in DRIVER, but were in turn collected from further Repositories; such C-Objects may carry information about their provenance history; DRIVER Objects do the same, by preserving such information, when available, and enriching it with the information relative to the current Repository under collection; specifically, DRIVER adheres to the OAI-PMH best practice of oai:about tags for provenance;
  • dr:objectIdentifier: the field contains the unique DRIVER Object Identifier, obtained by a combination between the local identifier of the Object and the Repository unique identifier from which it was harvested; in that, DRIVER’s approach is inspired by the OAI-PMH best practice of OAI-identifiers.
  • dr:dateOfCollection: the field contains the date in which the C-Object was collected from its original Repository.

2.2.2      DRIVER Information Space Consumers: users and applications

DRIVER Information Space provides a uniform description for heterogeneous C-Objects. End users and applications can therefore run fielded advanced searches over DMF records, in order to identify the C-Objects of interest. Section 3.4.4 describes in detail the query language and its semantics.
Moreover, applications can extract DRIVER Object descriptions as if they were OAI-Items, by means of DRIVER OAI-Publishing Services. For example, a harvesting client may connect to a DRIVER OAI-Publisher Service to harvest all C-Object metadata records collected in the last month; again, a user may search for the C-Objects whose MARC metadata records have been harvested in the last month. To this aim the harvesting date of the individual metadata records are extracted at C-Object collection time and “attached” to the record itself, as enrichment. DRIVER Information Space is exported as an OAI-PMH compliant Repository, which by default exports:
1.      one OAI-Set for each harvested Repository: the OAI-Set is identified by the unique name of the Repository and is relative to all OAI-Items collected from the Repository;
2.      a number of different metadata formats:

o        unqualified Dublin Core format: it must be delivered for DRIVER to be OAI-PMH compliant;

o        DMF format: DRIVER Objects become OAI-Items themselves, exposing their native DMF format to applications;

o        Any of the metadata formats harvested so far: such a view allows applications to harvest any of the metadata formats present in DRIVER, no matter the Repository from which they were harvested from; for example, OAI-PMH harvesting according to MARC means that all MARC metadata records harvested so far into DRIVER are exported to the requesting application;

o        Any of the metadata formats harvested so far, enriched with DPF fields: the same as above, where records are enriched with provenance information; this allows requesting applications to locally distinguish each metadata record’s provenance. For example

Note that applications can harvest records with respect to a given Repository, i.e. DRIVER OAI-Set, and metadata format; this way applications access records which have been cleaned and improved in quality with respect to the ones in the original Repository.

2.3      DRIVER Information Space for OAI-Item C-Objects

The current design of DRIVER Object Model concentrates on OAI-PMH compliant Repositories, hence provides Aggregator Services capable of collecting (harvesting, in the context of OAI/PMH protocol) C-Objects in the form of OAI-Items. Specifically, according to the OAI-PMH Object Model, a Repository is a container of items, where an item, i.e. an entity describing a real world resource, is characterized by a unique identifier and a number of metadata records, all describing the same resource, but according to different metadata formats; where Dublin Core metadata format is mandatory for all items. Moreover, each metadata record can be identified, within the Repository, by the OAI item identifier and a specific metadata format.

In this scenario, one DRIVER object represents the harvesting of one OAI-item, i.e. the existence of an object states that one specific OAI-item from a certain Repository with certain metadata records has been harvested into DRIVER. Currently, DRIVER test-bed restricts to items whose resource is available in digital textual form.

Figure 2 illustrates the DRIVER Object Model applied to OAI-Item C-Objects. Specifically, OAI-Item summaries, collected from the Repository, are the following:

  • OAI-Item Content: abstract of the textual file, if available, retrieved from the original file in the Repository;[8] and the references to the textual files
  • OAI-Item Description: the metadata records made available through the OAI-Item, describing the digital resource; the Dublin Core record is mandatory and always present;
  • OAI-Item Structure: structure is modelled as a one level tree, whose children are the reference to the files, its summary, and its available descriptions.

 

Figure 2 DRIVER Object Model: OAI-Items interpretation, DMF and summaries

The DMF fields are filled in by extracting information from the relative OAI-Item summaries. Note that in the current implementation, the fields dr:CobjTypology and dr:CobjModel will contain fixed values, specifically “textual” and “OAI”.

In the next Section, we present in detail how the DRIVER Information Space for OAI-Item C-Objects is constructed. As we shall see, the challenge is how to harvest full OAI-Items through the OAI-PMH protocol, which instead allows for the harvesting of one metadata format view at a time.

In the last section we shall present how such Information Space interfaces with the Services. Again, this process depends on the way DRIVER Objects are supported, and varies depending on the C-Object nature, i.e. on the original Object Model.

2.3.1      Object Model Implementation

Harvesting metadata records from a Repository Service has to be an asynchronous process, which cannot take into account OAI-Items as content units, but interacts with metadata format views of OAI-items. For example, Aggregator Services may harvest the Dublin Core view of the OAI-Items from a Repository one day and the MARC view a month later. In the process, records of different metadata formats must be kept locally separated, in order to provide the native view wished by users and applications, but logically glued in the Information Space because they are harvested from the same Repository and, when harvested from the same OAI-Item, also belong to the same DRIVER object. In the following we show how DRIVER Information Space is designed, in order to satisfy these physical constraints and support DRIVER Object Model.

2.3.1.1     Repository Service Harvesting and Aggregation

D3.1 [1] provides a definition of DRIVER Information Space, according to which the Information Space is physically constituted of a set of storage units, namely MDstore Resources. An MDStore Resource hosts metadata records conforming to one specific metadata format, regularly harvested by an Aggregation Service from one specific Repository Service.

The harvesting process is handled by Aggregator Services, which interpret the Harvesting Instance Resources relative to Repository Services. A Harvesting Instance is a tuple (Repository, Master MDStore, MDStores, mapping, stats), which represents the harvesting activity relative to a Repository (see Section 2.5.2.2). In particular:

  1. a Repository Service registering to DRIVER also specifies the list of Metadata Formats it wishes to publish, i.e. a subset of the list returned by the listMetadataFormats OAI-method; such list must include the Dublin Core format, as assumed by the OAI standard;
  2. The System creates a Harvesting Instance relative to the Repository harvesting; to this aim, the System creates one MDStore Resource for each metadata format exposed by the Repository; the MDStore created for the Dublin Core metadata format is elected as Master MDStore Resource of the Harvesting Instance;
  3. The Harvesting Instance is assigned to an Aggregator Service, its configuration completed by the Aggregator manager by specifying the rules in mapping and its harvesting schedule, and the harvesting started. Harvesting always starts from the Master MDStore with Dublin Core metadata records, and can continue, if possible, with the other MDStores.

2.3.1.2     Physical Design of DRIVER Objects

The metadata records relative to an OAI-Item are harvested and stored, depending on the metadata format, into different MDStore Resources, which are all associated to the same Repository. In DRIVER, such Repository centric harvesting processes, together with the OAI protocol constraints are exploited to implicitly define DRIVER objects. DRIVER Objects are defined by an identifier, by the DMF record and by the item it is related to:

  • Object identifier and item: Informally, the idea is that one metadata record, harvested from an OAI-Item, implicitly defines the DRIVER Object it belongs to. Such object has a dr:objectIdentifier obtained by combining the dr:itemIdentifier of the record and the DRIVER Repository Service identifier. Indeed:
    • the item identifier is unique within the Repository and the Repository identifier is unique within DRIVER Information Space;
    • the item identifier is assigned to all metadata records relative to the same item, hence any other metadata record harvested from the same Repository and relative to the same item, respects the same principle and is implicitly added to the same DRIVER object.

Object definition is fired at harvesting time, by enriching the harvested records with Harvesting Information fields regarding the item identifier and the Repository Service identifier.

  • Object DDF record: the OAI protocol establishes that all OAI-Items in a Repository must have a DC metadata record. In other words, there cannot be a DRIVER object, relative to an OAI-Item, that does not include the relative DC metadata record. DRIVER exploits this strong property, and relies on DC metadata records to represent DMF records. DC records, to be contained into a dedicated Master MDStore, are harvested and enriched with the harvesting date, but also with extra DDF fields.
  • Object DPF record: the DPF fields of an object can be obtained by the profile of any of the MDStores employed to host the object’s item metadata records. Indeed the profiles contain, among others, fields describing the associated Repository Service, such as: Provenance fields:
    • dr:repository: the name of the Repository Service Resource from which the metadata record was harvested;
    • dr:repositoryLink: a link to an informative or search web page relative to the repository;
    • dr:country: the name of the country of the institution responsible of the repository;
    • dr:institution: the name of the institution responsible of the harvested metadata record.

Users and applications interested in DPF fields of a metadata record or of an object, can therefore infer them from the profile of the MDStore on which they are operating.

Note that, in order to reach the metadata records relative to a DRIVER object (the OAI item for the outside world), the System extracts from the object identifier the Repository Service identifier and the item identifier. From the former, it finds the Harvesting Instance involving the Repository Service and determines the MDStores containing the records; from the latter, it searches into such MDStores to find the relative records.

2.3.1.3     Aggregation process: cleaning and enriching

MDstore Resources contain the “aggregation” of the metadata records the relative Repository. The aggregation process “cleans” and “enriches” harvested records before storing them into an MDStore Resource.

The “cleaning” phase adjusts the harvested record and preserves their original metadata format; the result of this process is that interested applications can, by connecting to DRIVER, harvest the original content of Repository Services, filtered and improved in quality.

The “enriching” phase attaches extra DRIVER fields to the harvested records, in order to make them become part of an object and, in the case of DC records, define the object DDF record. Any “cleaned” metadata record is enriched by the responsible Aggregator Service with the following Harvesting Information fields, kept in the metadata record header:

  • dr:recordIdentifier: the DRIVER unique identifier of the harvested metadata record, obtained as a combination of the dr:itemIdentifier and the MDStore identifier;
  • dr:itemIdentifier: the unique identifier of the OAI-item relative to the harvested metadata record;
  • dr:dateOfCollection: the date at which the metadata record was harvested.
  • dr:objectIdentifier: the DRIVER object unique identifier, obtained as a combination of the dr:itemIdentifier and the Repository Service Resource identifier provided at Repository registration time.

Such fields assign a unique DRIVER identifier to the metadata record, and implicitly assign the record to the relative object.

Aggregation Services deserve a special treatment to Master MDStores, i.e. the storage unit dedicated to Dublin Core format, which are used in the physical layer to represent DRIVER objects. “Cleaned” Dublin Core records are further enriched with the following DDF fields:

·         C-Object Content

o        dr:CobjContentSynthesis: the full-text extracted from the retrieved textual digital resource relative harvested metadata record;

o        dr:CobjTypology: the field is set to “Textual”, since the current System release focuses on textual resources only;

o        dr:CobjCategory: the fields specifies the specific category of the textual resource; examples are thesis, article, book, etc.; possible values are taken from a controlled dictionary;

·         C-Object Structure

o        dr:CobjIdentifier: the field contains the unique identifier of the C-Object, if it exists, in the original Repository; examples are DOI, OAI-Item identifiers, etc.

o        dr:CobjModel: the field is set to “OAI”, since the current System release focuses on OAI-Item C-Objects only;

·         C-Object description

o        The OAI-Item C-Object is described by the DC Application profile presented above, which synthesises the OAI-Item resource. The values for such fields are not materialized; instead, the respective Harvesting Instance will be assigned a field-to-field mapping from the available Dublin Core record into them. Such mapping is defined by DRIVER Aggregator Managers to ensure semantics uniformity and is stored into the Harvesting Instance.

o        dr:CobjMDFormats: the field specifies the Metadata formats used to describe the OAI-Items; values are taken from a controlled dictionary, and examples are Dublin Core, MARC, others;

o        dr:CobjDescriptionSynthesis: the keywords contained in the XML files of the metadata records available for the object; this field must include all DC  metadata record fields, while the others are optional.

2.3.1.4     Summary and scenario

Metadata records relative to the same OAI item are independently harvested, “cleaned”, and “enriched” to be deployed into different MDStores, relative to the same Repository Service, and become DRIVER metadata records of the same DRIVER object.

An object in DRIVER is uniquely identified by a new identifier (dr:objectIdentifier) obtained by combining the OAI-PMH identifier of the original OAI item (dr:itemIdentifier, unique in the Repository Service context) and the identifier of the Repository Service Resource from which it was harvested (unique into DRIVER context).

DRIVER metadata records are uniquely identified by a record identifier (dr:recordIdentifier) assigned at harvesting time (“enriching” phase) and obtained by combining the OAI item identifier that comes with the record and the identifier of the hosting MDStore Resource.

In Figure 3 we illustrate a Repository Service exposing its content according to three metadata formats: Dublin Core, MARCXML, and MODS. Therefore, its OAI-items may have three descriptive metadata records.[9] We assume that DRIVER System is configured to OAI harvest all the three formats; to this aim three new MDStores are deployed, one becomes the Master MDStore, and the Harvesting Instance relative to the Repository is created. Once the Aggregator Manager has completed the Harvesting Instance with the mapping from the DC metadata format into the DDF format, the Aggregator Service can start the aggregation activities. For each OAI item, the metadata records are independently harvested and preserved into different MDStores based on the metadata format.

 

 

Figure 3 – DRIVER Object Model Implementation

2.3.2      DRIVER Objects for OAI-Items: Index Services and OAI-Publisher Services

Indexing techniques define how to provide a searchable uniform DMF view of the Information Space, while OAI-Publishing methodology will show how to deliver different format views of DRIVER objects. This section describes how search and OAI-PMH publishing services can be supported by the System scaffoldings described above.

Indexing

In DRIVER, Index Resources target only Master MDStore Resources. The latter expose the DMF records corresponding to each DRIVER Object by: selecting the DDF fields attached to the DC record, calculating the remaining DDF fields by applying the mapping to the DC record, and attaching the DPF fields derived from the MDStore profile.

An important field of the DMF is dr:CobjDescriptionSynthesis, which by default contains the whole Dublin Core XML file as text. However, to offer complete and rich fielded search mechanisms, the field should also include the XML file text of the metadata records relative to all metadata records harvested for the object, in any format. Indeed, completing the field would enable the definition of queries, at least in key-word search terms, over the textual content of all the original metadata records harvested for the Repository items; such content is generally lost in the OAI-required translation into Dublin Core format. Unfortunately, such “union operation” cannot be performed at record harvesting time as harvesting into different MDStores from the same Repository is an operation asynchronous with respect to Items. Therefore, before ingestion of each MDF record, Index Resources should access the metadata records and enrich the field dr:CobjDescriptionSynthesis with the relative XML.

The same problem holds for the field dr:CobjMDFormats, which should be completed at indexing time to complete DMF information.

OAI-Publishing

DRIVER supports OAI-Publishing Services to external applications: unqualified Dublin Core format must be delivered, as well as OAI sets, single item harvesting and other peculiarities. However, also DMF formats, as well as harvested formats should be delivered:

  • OAI-Publishing of Dublin Core metadata: this can be done by straightforwardly delivering Master MDStores content;
  • DMF format: similarly to the process of Indexing, described above, Master Metadata Stores content is delivered by constructing the DMF record one by one;
  • Harvested formats: content of one or more MDStores sharing the same Metadata Format is OAI-Published, delivering cleaned records to the requesting users and applications;
  • Harvested formats with DPF: the same as above, with the difference that each record is enriched, before delivery, with the DPF fields derived from the MDStore it belongs to.

2.4      DRIVER Services

Service identification process in DRIVER started in the TA. There, a preliminary set of Services was identified and analyzed in order to organize the work packages activities. In a second stage, case studies and Functional Specification drafting led to the identification of the effective functionalities required for the system to work. Assigning functionality to Services identified in the TA, also led to the introduction of further necessary Services, e.g. metadata service, not originally included into the TA.

DRIVER System provides the Services needed to activate an Open Service infrastructure, over which Services can offer functionalities by interacting with each other. DRIVER test-bed uses the infrastructure to provide Services to collect heterogeneous content from external Repositories, to store and index such content, and to allow users and applications to access such content uniformly.

Figure 4. The DRIVER framework

In Figure 4 we illustrate DRIVER Services grouped by functional area and by application groups. Groups put together Services belonging to separated functional areas, to show when these cooperate to deliver the same application functionalities, here broke down into four main areas:

  1. Data Provision Services: Services dealing with data collection and aggregation; Aggregation Services collect content from external Repositories to form DRIVER Information Space (see Section 2.2);
  2. Data Management Services: Services offering data storage, indexing, and search functionalities:
    1. DRIVER Objects are stored within MDStore Data Structure Resources (MDStores), to be created by MDStore Services;
    2. DMF records, stored into MDStores, must be indexed by Index Data Structure Resources, to be created by Index Services;
    3. Collection Services allow for the definition of Collection Data Structure Resources; Collections are virtual sets of DRIVER Objects, specified by means of a predicate query over DMF fields;
    4. Search Services accept predicate queries, as well as Collections, over DMF fields and return the DRIVER Objects whose DMF record matches the predicate; Search Services find such Objects by running look-ups over the appropriate Index Resources and combining their results;
  3. Community Specific Services: Services for capturing community-specific user and application needs as specified in Section 2.2.2:
    1. Presentation Layer: includes the Services used by users and applications to interact with DRIVER Information Space:

                                                               i.      User Interface Services allow users to specify and run queries over the DRIVER Information Space; User Interfaces interact with Search Services;

                                                              ii.      OAI-Publisher Services allow applications to interact with DRIVER Information Space as if it was an OAI compliant Repository;

    1. User Management: includes the Services required to ease the interaction with the Information Space and provide advanced functionalities to registered users

                                                               i.      Profiling Services allow users to register to the System in order to exploit advanced functionalities: users can register to Communities and specify their specific topics of interest in order to be recommended for new content and make use of System query profiling functionalities;

                                                              ii.      UserCommunities Services allow for the definition of Communities as sets of Collections to which registered users can be subscribed; Communities are created to provide higher views of the Information Space, fragmented into Collections;

                                                             iii.      Recommendation Services provide registered users with notifications about the addition of interesting Objects into the Information Space.

  1. System Services: infrastructural Services, thanks to which DRIVER test-bed Services can interact and provide functionalities.
    1. Information Services keep track of all Services registered to the System and provide subscription and notification mechanisms to the Services;
    2. Manager Services are tailored to the types of Services that can join the System and are entitled to orchestrate and monitor their behaviour, in order to maximize System quality of service;
    3. Authorization and Authentication Services provide the security mechanisms that guarantee that Services are used by authorized users and Services.

The Service boxes in Figure 4 are not necessarily intended as single software entities, each implemented by a unique Web Service. Each of them represents instead a meta-service, i.e. a set of related real services acting as a whole to supply the functionality associated with the meta-service. A meta-service, during the design phase, will be decomposed into the set of its corresponding and cooperating real-services.

DRIVER System work flow

DRIVER test-bed Services provide specific functionalities on request from other Services or users. The Manager Service orchestrates the available Services in order to make them fully functional with respect to the expected functionalities: DRIVER users and applications respectively expect to be able to query and harvest an Information Space made of all Objects collected from the Repository Services.

Specifically, Repository Services register to DRIVER Information Service, thereby expressing the will to share their content. The Manager Service reacts automatically to the insertion of a new Repository by creating the resources needed to make the Repository content available to the System  (see Section 3.2.3). More specifically, the association between a Repository Service and the MDStores that will host its content is itself a System Resource, called Harvesting Instance. Hence, the Manager Service creates a new Harvesting Instance, together with the relative MDStores, and assigns the former to an available Aggregation Services, which will independently handle the activity of collecting the C-Objects from the Repository. Similarly, the Manager Service needs to ensure that the new MDStores are targeted by at least one Index. To this aim, the Service needs to search for available Index Data Structure Resources or potentially create new ones.

End-Point References (EPR)

The Web Service Addressing 1.0 – Core specification[10] defines two constructs: message Addressing Properties and End-Point References (EPRs), that normalize the information typically provided by transport protocols and messaging systems in a way that is independent of any particular transport or messaging system. In particular, a Web service endpoint is a (referenceable) entity, processor, or resource to which Web service messages can be addressed. Hence, EPRs convey the information needed to address a Web service endpoint.

DRIVER adopts EPRs to specify the address of a Data Structure Resource in the context of a Service Resource, e.g. a ResultSet Data Structure to be called through the responsible ResultSet Service. Figure 5 illustrate an example of EPR XML file as defined in DRIVER. The wsa:Address element contains the address of the Web Service to be called, while the driver:ResourceIdentifier element contains the reference to the local DataStructure Resource to be used through Service interface methods. Other information, such as the WSDL and the unique name of the Service can be found as attributes and subelements of the wsa:Metadata element, respectively.

 

<wsa:EndpointReference

xmlns:wsa=http://www.w3.org/2005/08/addressing xmlns:driver="http://www.driver.org" xmlns:wsaw="http://www.w3.org/2006/05/addressing/wsdl" xmlns:wsdli="http://www.w3.org/2005/08/wsdl-instance">

 

   <wsa:Address>

      http://146.48.87.147:8002/SOAP/ResultSet

   </wsa:Address>

   <wsa:ReferenceParameters>

      <driver:ResourceIdentifier>

         RS-2e117e02-c80e-11db-8603-000347f19e46-3

      </driver:ResourceIdentifier>

   </wsa:ReferenceParameters>

   <wsa:Metadata

xmlns:wsdli=http://www.w3.org/2006/01/wsdl-instance wsdli:wsdlLocation="http://146.48.87.147:8002/SOAP/ResultSet?WSDL">

      <wsaw:ServiceName>

         ResultSet01

      </wsaw:ServiceName>

   </wsa:Metadata>

</wsa:EndpointReference>

Figure 5 – Example of EPR

For example, EPRs are returned by the ResultSet Service on creation of a ResultSet Data structure to the consuming Service, and can be passed on to third Services which will be able to contact the ResultSet straightaway. When the Search Service sends a query to an Index Service, it is returned the EPR of the ResultSet Data Structure created by the Index Service to contain the query result.

2.5      DRIVER Resources

Resources are the entities orchestrated by DRIVER System to deliver European Repository Infrastructure functionalities. Indeed, DRIVER System Infrastructure is designed in terms of Resource Types and how Resources belonging to such Types must cooperate and interact in order to deliver the required functionalities.

Resources are “entities”, in the sense that their definition is independent from their practical implementation. In particular, we can call Resource any software entity (in DRIVER a Service or a Data Structure) that implements the functionalities specified by one of the DRIVER Resource Types defined in D3.1 [1].

In the following, DRIVER Resources are presented. Initially, the principles of Resources Monitoring and Orchestration through the use of Resource profiles are recalled. Subsequently, the model of DRIVER Resources is formalized using UML class diagrams; each Resource is described in its behaviour within the System. Finally, Resource profiles are defined in detail, as a list of parameters.

2.5.1      Resource Management and Resource Profiles

The design of the Services dealing with System Management Area (see Section 3) takes care of management and orchestration of DRIVER Resources:

·         An Information Service maintains Resource Profiles, which represent and describe the Services or the Data Structures that implement specific Resources and are currently available to the System. Data Structure Resources are managed by Service Resources, which are also in charge of exposing their profile. In particular, Profiles are exposed by the Service in charge of the Resource as an XML WSDL file. The Information Service keeps an up-to-date picture of the System, and it is used by other Services (see Section 3.2.2) to:

o        discover profile information about the Resources and find an access point to them;

o        subscribe to actions over Resource profiles and be notified when these occur.

·         A Manager Service is in charge of monitoring and orchestrating the available Resources by interacting with the Information Service, with the aim of maximizing System Quality-of-Service.

The goal of the described functionality is to achieve a result similar to that obtained with the UDDI (Universal Description, Discovery, and Integration) mechanism in the Web Service world. UDDI provides a method for publishing and finding service descriptions. In the UDDI context, the “profile” of the Web Service is represented by the WSDL service interface published in the UDDI Registry. The difference from the UDDI mechanism is that there the goal is to set up a service-to-service communication, while in DRIVER we aim at connecting a DRIVER Resource with its consumers.

2.5.2      Resource Model

The goal of the Resource Models presented in Figure 6 and Figure 7 is to capture the structure and the main relationships among the resources. The root class, Resource, models the generic DRIVER Resource. It is characterized by a unique identifier, which identifies it unambiguously, by a type, which discriminates among the different Resource Types, and by a kind, which categorizes Resources belonging to different Resource Types but sharing more properties than those common to all Resources. DRIVER supports the following Resource Kinds:

  • ServiceResources: this category includes all profiles conforming to Resource Types relative to Services;
  • DRIVERPendingServices: this category includes all profiles conforming to Resource Types relative to Services waiting for validation;
  • UserResources: this category includes all profiles conforming to Resource Types relative to users; currently DRIVER supports only one User Resource Type, but others may arise in the future;
  • CommunityResources: this category includes all profiles conforming to Resource Types relative to Communities; currently DRIVER supports only one Community Resource Type, but others may arise in the future;
  • CollectionResources: this category includes all profiles conforming to Resource Types relative to Collections; currently DRIVER supports only one Collection Resource Type, but others may arise in the future;
  • HarvestingInstanceResources: this category includes all profiles conforming to Resource Types relative to HarvestingInstances, i.e. entities that associates external Repositories with internal MDstores;
  • IndexResources: this category includes all profiles conforming to Resource Types relative to Indices;
  • MDStoreResources: this category includes all profiles conforming to Resource Types relative to MDStores, hence units in charge of storing C-Objects or their parts;
  • RecommendationResources: this category includes all profiles conforming to Resource Types relative to users Recommendations;
  • ResultSetResources: this category includes all profiles conforming to Resource Types relative to ResultSets.

Association of a Resource Type with a Resource Kind is statically declared at the time of Resource Type creation and can vary during the Resource Type life-time. For example, this is the case for Service Resources, which initially belong to the DRIVERPendingService Kind and then, when validated, are moved to the ServiceResource Kind.

By specializing this generic concept of Resource, in the following we introduce the two main types of DRIVER Resources: Service and Data Structure Resources.

2.5.2.1     Service Resources

DRIVER Service Resources are entities supporting the functionality described by a DRIVER Resource Type. They embody the business rules needed by the Users and by the System, i.e. other Service Resources, to accomplish their tasks. Service Resources are all associated with the Resource Kind ServiceResource.

Being “active” entities, Service Resources activity is delivered with a measurable level of Quality of Service (QoS). We shall see that QoS can be measured in term of efficacy and efficiency. Efficacy depends on the management policies relative to the Service Resource Type, thus are specific to the Service Resource, while efficiency be inferred from in terms of the following attributes:

·         Availability, i.e. the probability that a resource can respond to requests;

·         Capacity, i.e. the limit on the number of requests a resource is capable to handle;

·         Response time, i.e. the delay from the request to getting a response;

·         Throughput, i.e. the rate of successful request completion.

Figure 6 The DRIVER Resource Model: Service Resources

In DRIVER Architectural Specification, an instance of a Service Resource Type, i.e. a Service Resource, is not necessarily intended as single software entity, implemented by a unique Web Service. It is designed instead a meta-service, i.e. a set of related real Web Services acting as a whole to supply the functionality associated with the meta-service. A meta-service, during the design phase, will be decomposed into the set of its corresponding and cooperating real-services.

As specified in D3.1 [1], Services need to subscribe to specific actions and handle the consequent notifications; besides, they may be contacted by the Manager Service in order to operate on the set of Data Structure Resources they manage. In the following we introduce Service Resources, by explaining their role in DRIVER, and listing the subscription-notification mechanism they need to enforce, and the actions sent by the Manager Service they might need to execute.

Authentication & Authorization Service

DRIVER Authentication and Authorization Services provide the functionality required to enforce security over the System by the means of authentication and authorization mechanisms. The architecture of the Service grounds on the eXtensible Access Control Markup Language (XACML) standard[11]. XACML addresses the issues of authorization in distributed, heterogeneous, enterprise scale Systems – which very well characterizes DRIVER environment.

Repository Service
An external repository contributes with its content to DRIVER’s Information Space. A repository is a peculiar kind of DRIVER Service, since it cooperates with the system, but resides outside DRIVER’s “jurisdiction”: it participates to the system by means of the OAI-PMH protocol, but will not be subject to DRIVER System management operations and protocols. Specifically, Repository Services will be registered and validated by a System Administrator, which will provide a Repository Service Profile describing its current status, but will not update their service status, i.e. the profile, to the System; DRIVER System will take care of the update of the repository service profile, by explicitly requesting status information to the services through the OAI-PMH protocol and updating the profile accordingly.

When a new Repository Service is added to the System, it exposes the list of harvestable available metadata formats. The System creates one Harvesting Instance Data Structure Resource for each Repository Service.

MDStore Service

An MDStore Service is a Web Service managing a number of MDStore Resources. It is contacted by all Resources that need to interact with an MDStore, i.e. Index Resources, OAI-Publisher Services, User Interface Services, Search Services, and by the Manager Service whenever an MDstore need to be added, deleted, or updated in the System.

Aggregator Service

An Aggregator Service is a Web Service managing a number of Harvesting Instance Resources. For each Harvesting Instance Resource it schedules a number of aggregation operations, i.e. harvesting, cleaning and enriching (Section 2.3.1), from a given Repository Service into the relative MDStores. An Aggregator Service is contacted by the manager Service in order to be assigned or removed a Harvesting Instance Resource. Aggregator Managers are entitled for configuring aggregation of newly assigned Harvesting Instances, by providing local harvesting time-schedules, cleaning and enriching rules, including the mapping from Dublin Core onto DDF application profile fields.

Index Service

An Index Service is a Web Service managing a number of Index Resources. It is contacted by all Resources that need to interact with an Index, i.e. Search Services, and by the Manager Service whenever an Index need to be added, deleted, or updated in the System.

Search Service

The Search Service is a Web Service that exposes Browse and Search functionalities through a common query language. The search functionality allows for searches in the entire set of harvested documents, while the browse functionality produces lists of documents that allow for browsing documents by some property, like author or year of publication. Using either the browse or search functionalities is realized by issuing an appropriate query to the search service, to which the search service responds appropriately. Since there are more than one ways to produce the response to a query, the search service has to follow the most efficient one. This optimization operation is an internal to the search service and does not affect the user. In other words, the user is free to concentrate on specifying proper search criteria rather than deal with the query execution details.

Collection Service
Collection Services are Web Services for the management of Collection Resources. They allow the creation, deletion, and update of any Collection Resource in the System. Collections are in charge of Collection Managers and do not interact with the Manager Service. On the other hand, due to the semantics of DRIVER Collections (Section 2.5.2.2), which can be obtained by other Collections, i.e. parent Collections, Collection Managers need to be prompted with decisions about the modification of Collections when their parent Collections have been modified. To this aim, Collections will need to subscribe to their parent Collection changes.
Recommendation Service

A Recommendation Service is a Web Service which manages the Recommendation Resources. It handles the automated generation of new recommendations based on system or data content changes and are targeted to specific users based on their personalized preferences. It also allows the interactive creation of general, system-wide announcements by a System Administrator aimed to all end users. The service is closely linked to the Profiling Service to acquire all relevant user information and manage the actual notification achieved either via e-mail or web alerting.

Users Communities Service

The Users-Communities Service is a Web Service responsible for maintaining the Community Resources. It allows users with privileged rights (Community Managers) to form communities which are based on system defined data collections. End users utilize it to subscribe to these communities and are therefore provided with a restricted view of the DRIVER information space matching their scientific interests.

Profiling Service

The Profiling Service is a Web Service which manages User profiles.  End users employ this service to record their preferences. These preferences may relate to their scientific interests, to specific views of the DRIVER information space (through the subscription to existing communities), to the presentation/layout of the results of searching or browsing and to the setup of alert notifications. It is accessed by most of the services in the Presentation and User layer of the DRIVER system for all personalized functionalities provided to registered users.

User Interface Service

The User Interface provides a Web front end of all the DRIVER functionalities. It includes a Web Service interface which accesses all other DRIVER Web service components. End users utilize it to navigate and browse through all of DRIVER Information Space, as well as various DRIVER system defined metadata (collections, communities, etc.). They are allowed to initiate searches and see the best hits in accordance with their scientific interests (for registered users). 

OAI-Publisher Service

OAI-Publisher Services expose DRIVER Objects through the OAI protocol. Accordingly a Dublin Core metadata format view of the Information Space must be always available. Incremental harvesting and OAI-Sets must be exposed. Such Services do not need to subscribe to specific actions nor be contacted by the Manager Service.

2.5.2.2     Data Structure Resources

Data Structure Resources are generated on System demand by Service Resources; they are passive Resources, in the sense that they are managed by the relative Service Resource.

In DRIVER Architectural Specification, an instance of a Data Structure Resource is designed as an abstract component managed by a specific Service Resource. Indeed, the physical nature of such components varies from Service to Service. For example, Index Data Structure Resources are concrete sources that upload records and respond to look-up calls, which are used by an Index Service Resource to deliver some functionality; while Harvesting Instance Data Structure Resources are logical concepts representing the harvesting activity relative to a Repository Service, which are interpreted by an Aggregator Service Resource to deliver some functionality.

Data Structure Resources are associated to the relative Resource Kind, as specified above.

Figure 7. The DRIVER Resource Model: Data Structure Resources

System Configuration

There is only one System Configuration Resource in the System, which is managed by System Administrators through the Manager Service administration interface. The Resource contains all parameters required by the Manager Service itself and by other Services to correctly run the System.

MDstore

MDstores are the Resources generated by a Metadata Store Service; they are metadata storage units, hence can store records and return records based on the respective unique identifier. “Bulk” storage and retrieval operations are also supported, to provide efficient storage on harvesting and efficient retrieval on OAI-PMH publishing or query answering.

MDstores are characterized by the metadata formats of the records stored in the unit, the Repository from which the content was withdrawn, and a number of statistics, e.g. last harvesting timestamp and type, etc.

The union of the objects within MDstores forms DRIVER’s Information Space.

Collection

Collections are the means through which a large and heterogeneous Information Space can be shown to end-users in an organized way. A Collection is a resource generated by a Collection Service that identifies a subset of objects into DRIVER Information Space. A Collection virtually determines a set of objects by means of a predicate, i.e. the retrieval condition that is obtained by appropriately combining the membership condition (the Collection specific predicate as defined by the Collection Manager) and the parent condition (the retrieval condition of the parent collection). Note that predicates are specified over DMF fields in accordance to the specification language reported in Section 2.2.1, Uniform View. By means of predicates over fields such as dr:repository it is possible to identify Collections of objects relative to the subpart of the Information Space harvested from a specific Repository Service; through fields such as dr:recordIdentifier it is possible to define Collections made out of a fixed enumeration of documents.

DRIVER exposes only one default Collection, corresponding to the whole Information Space and represented by InfoSpace. The Collection Definition Language is the following:

 

Coll       ::=  (cName, cDescr, cRCond, cMCond, cSource, cFrozen)

cSource    ::= Coll | InfoSpace

 

A Collection is defined by providing a name (cName), a textual and human oriented description (cDescr), a membership condition (cMCond), the data source where this condition have to be evaluated (cSource), a retrieval condition (cRCond), and a “freezing” flag (cFrozen). Specifically, cSource may be either a Collection (no cyclic definitions allowed) or the whole Information Space; cRCond is obtained by combining the cMCond and the retrieval condition of cSource; cFrozen set to true makes the retrieval condition unaffected from changes to the parent Collection retrieval condition from which it was obtained; when set to false, the retrieval condition is maintained updated according to the changes made on the parent collection.

As mentioned above, the objects constituting a Collection are obtained by evaluating the relative retrieval condition. Such condition is obtained at Collection creation time by obtaining the retrieval condition of the cSource Collection and combining it with the provided membership condition cMCond. Note that, if cSource is the InfoSpace, the condition to be combined with cMCond is the predicate true, i.e. all objects into InfoSpace. Note that, by following the above principle, retrieval conditions are always queries defined over the whole InfoSpace. A Search Service that needs to perform a search over a given Collection, directly accesses the Collection retrieval condition.

Note that, a change to the membership condition of one Collection may affect all Collections directly derived from it. In particular, the retrieval condition of all child Collections of such Collection should be updated to the modification. The cFrozen flag allows to shield a Collection from this implicit upgrade. Note that, anytime, cFrozen can be set to true and the parent condition internally stored is refreshed to the actual parent Collection retrieval condition.

Index

An Index is a Resource generated by an Index Service aimed at providing efficient access to a set of MDstores. An Index is configured to index records from a given set of MDStores, where the records bear specific values into the DMF fields selected as Index Configuration Fields.

A query over an Index, i.e. a predicate over the Index configuration fields, returns all records satisfying the predicate, plus the corresponding identifier. Such records are a subset of the relative DMF records, whose fields are established as a further Index Configuiration Field.

User

A User is a human interacting with the system. There are two types of users, end users, i.e. humans interacting with User Interface Services for search reasons, and managers, i.e. administrative users, in charge of configuring the Services.

End-users need to register to the system in order to get advanced functionality, tailored to the user specified preferences. Registered end users and managers have a corresponding user profile into the System, and therefore become System Resources; indeed, DRIVER Resources can manipulate their description, i.e. their profile, and build functionality around them, such as recommendation and profiling functionalities.

Community

A Community is a resource identifying a “view” of the information space tailored to a specific community of end-users. Such a view is provided in terms of a set of Collections, associated with the Community. End-users can query the information space by selecting a Community and subsequently one or more of the collections therein. A community is a resource generated by a Community Manager interacting with a User Service interface.

Harvesting Instance

A Harvesting Instance represents one System harvesting activity related to a Repository Service and its relative MDStore Resources. Indeed, since a Repository Service may publish its content according to different metadata formats, DRIVER System assigns a different MDStore Resource for each such format.

Harvesting Instance Resources are created by the System when a new Repository is made available for harvesting and removed when the Repository Service is removed. Specifically, a Harvesting Instance is a tuple (Repository, Master MDStore, MDStores, mapping, stats), where:

  • Repository is the identifier of the Repository Service to be harvested,
  • Master MDStore is the identifier of MDStore that will host the Dublin Core OAI-Harvesting from the Repository Service,
  • MDStores is the set of identifier of the MDStore Resources that will host the OAI-Harvesting relative to other metadata formats exposed by the Repository Service,
  • mapping is the set of field-to-field mappings from the Dublin Core format into DRIVER Descriptive Fields (see Section 2.3.1); other mappings can be specified, from other formats into the DDF fields.
  • stats is the set of statistics relative to each harvesting into an MDStore, e.g. last harvesting date.

Harvesting Instances are used by the System to monitor, organize, and distribute the harvesting activity among the Aggregator Services available.

ResultSet

ResultSet are containers of XML files that can be populated and accessed asynchronously. Typically, ResultSets are used to mediate between Services requiring paging functionalities over an arbitrarily long result of a search, and the Services generating such result by executing the search. See Section 3.1 for further details.

Recommendation

A Recommendation is a resource generated and maintained by the Recommendation Service capturing an announcement or alert type of suggestion targeted to all or specific end users. Each one of the data structures representing a recommendation remains in the System until all interested parties have been notified or until it has expired.

2.5.3      Resource Profiles

As stated, a Resource profile is a set of information related to a DRIVER Resource supplied at registration time and updated at run-time. Its main purposes are:

o        representing the existence and availability of a Resource;

o        supporting resource discovering;

o        storing information that help the System to handle Resources correctly in order to maximize QoS.

All Resource Types must include the following information, common to all Resources:

  • the Resource Identifier;
  • the Resource Type;
  • the Resource Kind.

More generally, each Resource Type, since it describes Resources with specific behaviour and delivering specific functionality, defines its own profile structure, physically described as an XML Schema. In general profile’s structures are made of three parts:

  1. Resource configuration: the descriptive parameters of the Resource, typically specified at registration time; different instances of the same resource type are described by the same set of parameters (i.e. XML Schema), but may have different values for such parameters;
  2. Resource status: parameters describing the run-time status of a Resource, typically updated at run-time influenced by Resource behaviour; these usually have boundaries determined by Resource configuration parameters, and include efficiency parameters, to measure QoS;
  3. Resource interaction policies: each resource may specify the set of resources it may want or not want to interact with on a specific action; e.g. an Index may be available only to target a specific MDstore.

Resource profiles can be generated or modified by Resource Managers (Service Resource Profiles), Service Resources (Data Structure Resource Profiles), and by the System, i.e. the Manager Service (all Resource Profiles, but not itself). Resource Types can be added or removed from the System only by System Administrators.

2.5.3.1     Service Resource profiles

The Service Profile consists of a list of descriptive parameters divided into categories characterizing the running Service. Such parameters describe the potential behaviour and the current behaviour of a Service. Accordingly, most of them strictly depend on the Service Type. The Service profile must include:

o        the Service identifier;

o        the Type of the Service, which identifies the Service role within the system. Currently, allowed types are: Search Service, Index Service, Aggregator Service, UserCommunity Service, User Interface Service, Profiling Service, Recommendation Service, Collection Service, OAI-Publisher Service, Authentication and Authorization Service, and MDstore Service;

  • the Kind of the Service, which identifies the category of the Resource Type of the Service. Service Resources can be associated to the Resource Kinds: ServiceResource or DRIVERPendingResource.
  • deployment information, e.g. the call point of the Service;

o        Configuration parameters: a Service of a given type describes its potential, i.e. its functionality, in terms of its configuration parameters; e.g. maximum number of indices maintainable by one specific Index Service. Note that different Services of the same type may deliver the same functionality but with different potentials, hence different values for the same configuration parameters; e.g. two Index Services, of the same type Index Service Type, may support different maximum indices numbers. Configuration parameters cannot be changed by the system, only the Service Manager is allowed to configure them.

o        Status parameters: in order to exploit Service’s functionalities at best, the System needs to be informed about the functionalities currently consumed by the Service. Status parameters are monitored and modified by the System, i.e. by the Manager Service, and provide such information; their modification must conform to the limits specified by the Configuration Parameters into the profile. For example, if a new Index Resource is required, the MS searches for Index Services whose current number of active indices does not exceed the maximum number of indices allowed.

o        Quality-of-Service parameters: in order to maximize performance and effectiveness of system Service Resources the System deals with Quality-of-Service issues.  Quality-of-Service (QoS) parameters describe the current technical status of a Service and are used by the System to interpret, measure, and schedule the distribution of tasks among the Services. Service QoS parameters in DRIVER are availability, capability, response time, and throughput, and are kept-up-to grade by the Service itself. For example, the System may remove an Index from an Index Service with a low QoS and assign it to an Index Service with better QoS.

o        Authorization parameters: a Service of a given Service Type must interact with Services of specific Service Type, to be declared into the Service Architectural Specification. However, a specific Service instance may be interested in putting further constraints over such specification. For example, an Index Service must interact with Services of type Metadata Service Type, but be available or not available for accepting interactions with Metadata Service Type Service instances with a certain URI domain. Authorization parameters are constituted by a list of in/out constraints over URIs, which declaratively defines the domain of Service instances that can interact with the Service.

o        Blackboard (action messages area): the System, i.e. the MS, given configuration, status, and QoA parameters of all System Resources, may decide to move the System in a new status by changing the internal status of some of the available Resources or by requesting the creation/deletion of Resources; e.g. the System may require an Index Service to add an index with specific features or to remove one of them. All these actions are invoked by the Manager Service through an interaction with Service Resources. Such interaction takes place by implementing an Orchestration Protocol based on exchanging messages through Service profiles blackboard (Section 3.2.2).

Such parameters are specific to the Services Type, as well as the actions that determine the interaction between a Service and the Manager Service. In the following configuration parameters and actions are listed per Service.

Manager Service

Configuration:

  • DPF fields
  • Index Configuration Fields: a subset of DMF fields used to organize Index Resources
  • Index Result Fields: the DMF fields to be returned as a result of Index Resource look-ups (dr:recordIdentifier and dr:objectIdentifier are mandatory)
  • Efficiency thresholds for all Service Resources
  • Efficacy policies, such as:
    • minimal number of Instances for each Resource Type
    • minimal number of replicas for each Resource TypeRepository Service

Configuration:

  • List of Provenance Fields
  • Other fields characterizing a Repository Service, to be decided.

Quality-of-Service:

  • Repository availabliity

Action messages:                                                                

ResultSet Service

Configuration:

  • Maximum overall size of the ResultSets
  • Maximum storage size of the single ResultSet
  • Maximum number of ResultSets

Status:

  • Number of ResultSets handled
  • Overall storage size occupied by the ResultSets

Quality-of-Service:

  • To be decided
MDStore Service

Configuration:

  • Maximum overall size of the MDStores
  • Maximum storage size of the single MDStore
  • Maximum number of MDStores

Status:

  • Number of MDStore handled
  • Overall storage size occupied by the MDStores

Quality-of-Service:

  • To be decided

Action messages:

  • Create MDStore: parameter, DPF record and Index Configuration Fields
  • Delete MDStore: parameter MDStore identifier
  • Update MDStore: parameter, DPF record and Index Configuration Fields
Aggregator Service

Configuration:

Status:

  • The Harvesting Instances handled by the Service

Quality-of-Service:

  • To be decided

Action messages:

  • Manage Harvesting Instance: parameter Harvesting Instance id
  • Release Harvesting Instance: parameter Harvesting Instance id
Index Service

Configuration:

  • Maximum overall size of the Indices
  • Maximum size of the single Index
  • Maximum number of Indices
  • DMF fields to be returned into the result of the Indices

Status:

  • Number of Index handled
  • Overall storage size occupied by the Indices

Quality-of-Service:

  • To be decided

Action messages:

  • Create Index: parameter, Index Configuration Fields values
  • Delete Index: parameter Index identifier
  • Update Index: parameter, Index Configuration Fields values, MDStore addition or removal
Search Service

Configuration:

  • Maximum number of concurrent searches/requests
  • Maximum number of connections
  • Local cache parameters, e.g. cache size , cache expiration date

Status:

  • Number of concurrent searches/requests
  • Current number of connections

Quality-of-Service:

  • Average Response Time

Action messages:

Users-Communities Service

Configuration:

  • Maximum number of communities
  • Maximum number of collections per community

Status:

  • Number of communities handled

Quality-of-Service:

  • To be decided

Action messages:

  • Create Community: parameter Collection identifier List
  • Delete Community: parameter Community identifier
  • Update Community: parameter Community identifier, Collection identifier List
Profiling Service

Configuration:

  • Maximum number of users
  • Maximum number of communities subscribed to
  • Maximum number of preference predicates
  • Maximum number of layout preference predicates
  • Maximum number of recommendations per user

Status:

  • Number of users handled

Quality-of-Service:

  • To be decided

Action messages:

  • Create User Profile: parameter user information, preferences, communities
  • Delete User Profile: parameter User Profile identifier
  • Update User Profile: parameter user information, preferences, communities

 

Recommendation Service

Configuration:

  • Maximum number of recommendations
  • Maximum number of user references

Status:

  • Number of Recommendations handled

Quality-of-Service:

  • To be decided

Action messages:

  • Create Recommendation: parameter Recommendation Text, Expiration Date
  • Delete Recommendation: parameter Recommendation identifier
  • Update Recommendation: parameter Recommendation Text, Expiration Date
User Interface Service

Configuration:

  • List of Provenance Fields
  • List of Index Result Fields

Status:

Quality-of-Service:

  • To be decided

Action messages:

2.5.3.2     Data Structure Resource Profiles

Data Structure Profiles describe the status of the relative Resources. Unlike Service Resource Profiles, which are handled by the Service Resources themselves, Data Structure Profiles are handled and exposed by the Service Resource in charge of the Data Structures.

The Data Structure Profile consists of a list of descriptive parameters divided into categories characterizing the DataStructure. Such parameters describe the features of the DataStructure. Accordingly, most of them strictly depend on the Data Structure Type. The Data Structure profile must include:

o        the Data Structure identifier;

o        the identifier of the Service managing the Data Structure;

o        the type of the Data Structure, which identifies the service role within the system. Currently, allowed categories are: Collection, MDstore, Index, Community, Harvesting Instance, User Role, ResultSet, Recommendation, and User.

o        deployment information, e.g. the call point of the service, which should include the Service Resource handling the Data Structure;

o        Descriptive parameters: a number of parameters specifying the particular property of the Resource; e.g. the statistics of an Index.

Note that Data Structure Profiles do not feature an action messages section. Indeed, actions directed to the Data Structure will be received and interpreted by the Service Resource in charge of the Data Structure.

In the following all parameters needed by Data Structure Resources are listed by Type of Resource.

MDStore

Configuration:

·         Index configuration fields: fields of the DMF used by the MS to organize System Index Resources; e.g. dr:country, dr:language;

·         Up-to-date values available from the records into the MDStore relative to the Index Configuration Fields;

  • DPF fields and values, to be synchronized with the ones from the relative Repository Service;
  • Metadata format of the MDStore records;
  • Master MDStore flag.

Status:

  • Last operation (incremental harvesting or refresh);

·         Index Configuration Fields values: for each configuration field, the values available for that field in the MDStore records, to be gathered at harvesting time;

Index

Configuration:

  • Index size: memory and records
  • Index Configuration fields

Status:

  • Index Configuration Fields values
Harvesting Instance

Configuration:

  • Repository Service
  • Master MDStore: relative to the Dublin Core harvesting activity
  • MDStores: each relative to  the harvesting activity related to other metadata formats
  • Mapping: DC DDF Application Profile fields

Status:

  • Statistics about the Harvesting; e.g. last harvesting for each MDStore
User

Configuration:

·     Provenance Fields for preference selection

·     Index Result Fields for layout preference selection

Status:

  • Name & Info of the User
  • Credentials
  • Preferences (general and layout)
  • Communities subscribed to
  • Membership Condition
  • Retrieval Condition

 

Community

Configuration:

·     Collection membership

Status:

  • Name of the Community
  • Membership Condition
  • Retrieval Condition
Recommendation

Status:

  • Recommendation text
  • Expiration Date
  • Last access date
  • Retrieval Condition
Collection

Status:

  • Name of the Collection
  • Membership Condition

Retrieval Condition

  • Parent Collection Condition
  • Parent Collection
  • Frozen flag
Result Set

Status:

  • Service instance that generated the Result Set
  • Date of creation
  • Expiry date
  • Type of Resource Service that returns full XML files
  • Label: gives a name to the ResultSet, decided by the Service that creates the ResultSet

 

This section presents the architecture of the DRIVER system. The architecture is presented by grouping the services by the following functional area:

o        System Management;

o        Presentation Layer;

o        User Management;

o        Information Management;

o        Collective Layer.

For each area, an integrated vision is reported. Then, each service belonging to that area is presented in its real components. A component of a service is “a part” of that service that can be hosted in different networked machines. To become a DRIVER node, a machine must have installed a WS hosting environment where DRIVER services can run.

In the integrated UML diagrams presented for each area, the subsystems represent the components of each service. Therefore, they are the WServices or Web applications (in the case of User Interfaces to components) that make concrete the system.

The main purpose here is to gain a general understanding of how the system is decomposed, and how the individual parts work together to provide the expected functionality.

3.1      Data Provider Services and Result Set Services

In DRIVER some Services assume the form of data providers. This is the case for example, of Information Services, MDStore Services, and Index Services, which are typically “queried” by data consumer Services in order to retrieve some interesting content in the form of a list of XML files.[12] Specifically, data providers receive a request and return a reference to the list of results locally stored; such reference can be accessed by data consumers in order to “bulkily” retrieve a subset of the result list by providing an interval of list positions, i.e. paging approach. To become a data provider Service, a Service must implement the following bulk data provider interface:

o        generateBulkData(parameters):bdId. The method returns the reference bdId to a local bulk data result returned in response to parameters; the signature of parameters depend on the Service specific request format, e.g. may include a metadata format in the case of MDStore Service, or an XQuery in the case of the IS-Store, etc.

o        getBulkData(bdId,fromPosition,toPosition):XMLFile[] The method access the list bdId, extracts the XML files in the interval fromPosition-toPosition, and returns it as result; if some positions in the interval fromPosition-toPosition are “out of bound”, i.e. no XML file is available, the result contains an EndOfFile message.

o        getNumberOfResults(bdId):(int,status) The method returns the total number of entries in the result bdId and the status of the result, which may be either open, i.e. the result is still under computation, or closed, i.e. all results have been computed.

Such a solution has two limitations:

  1. Data consumer Services may need to filter the list of results according to some structural and content constraints; e.g. a requester may be interested on retrieving only those metadata records from a given MDStore satisfying specific content properties. Indeed, the operation is common to the generic data consumer and might well be supplied by the data provider as an embedded functionality. On the other hand, such functionalities are out of the scope of data providers, which should focus on bulk data answers; e.g. MDStores responses are of the form of OAI-PMH publishers.
  2. Data providers Service may not be “native”, meaning they are not generating the results themselves, but only aggregating the results of other data providers; in this cases, building a bulk data provider interface becomes a tedious work, out of the scope of the service functionalities; e.g. a Search Service accesses and elaborates the results of a set of Index look-ups operations, thereby interacting with the corresponding set of bulk data references.

To overcome both limitations and limit software redundancy, the DRIVER test-bed introduces ResultSet Services, which are Services dedicated to the management of ResultSet resources. A ResultSet is an entity containing a list of XML files, which provides data consumers with functionalities for paging access and filtered access to the list. Filtering is performed by applying the style-sheet passed on requests to the list, while paging is supplied by interpreting an interval of list positions.

A ResultSet is open when the list it contains is not complete, while it is closed when the list is complete.

In DRIVER, when a consumer Service sends a request to a data provider Service, the latter generates a ResultSet resource and returns its reference to the consumer, which can asynchronously access the ResultSet according to its needs; it may request for (i) a specific part of the result list, i.e. paging functionality, and/or (ii) choose the format of selection of the results by specifying a preferred style-sheet.

3.1.1      ResultSet Resources: creation and access

Two types of ResultSets can be created, which differ in the way they are populated, Pull-Mode ResultSets and Push-Mode ResultSets. Similarly, a ResultSet can be accessed by consumer Services according to different policies, namely waiting and non-waiting.

Pull-mode ResultSets

Pull-mode ResultSets are configured to refer to a number of Data Sources, which may be data provider Services or other ResultSets (see Figure 8), and to cache a given number of XML files (given in “page” size). At creation time, pull-mode ResultSets cache the given number of XML Files by retrieving them from the specified data sources, in a token-ring fashion. A Pull-mode ResultSet becomes closed when all its data sources are closed and no more XML files can be retrieved from them.

Figure 8 – Pull-Mode ResultSet

Depending on the consumer requests, these ResultSets may:
o        cache other pages from the data sources in order to satisfy the current request or minimize future responses latency; the ResultSet will invoke the following methods:
    • getBulkData, for data provider Service data sources (see definition above)
    • getResult, for ResultSet data sources (see definition below);
o        vary the page size in order to optimize the amount of XML files to be cached.

Such ResultSets are independent from the data provider that created them. For example, consider an OAI-Publisher Service requesting from MDstore Resources the metadata records conforming to a given metadata format (see Figure 9). The MDStore Service managing such MDStores becomes the “native” data provider and generates a bulk data reference for each MDStore (locally invoking generateBulkData(parameters)), then generates a pull-mode ResultSet configured to retrieve the data from such bulk data references, and returns the ResultSet identifier to the OAI-Publisher Service, which will then interact directly with the ResultSet.

Figure 9 – Pull-Mode ResultSet for MDStore Services

Push-mode ResultSets

Push-mode ResultSets are populated by the data provider that created them (see Figure 10). Somehow, the creation of a push-mode ResultSet makes the data provider a “native” data provider, which in fact has to calculate and provide the ResultSet content as the data providers do with the local bulk data. For this reason, these ResultSets must be explicitly closed by the Services that created them.

Figure 10 – Push-Mode ResultSet

For example, a User Interface Service (UIS) is designed to send a query to a Search Service and be returned a ResultSet reference. The UIS interacts with the ResultSet in order to retrieve results by pages and in order to visualize only part of the DMF fields. The Search Service generates a push-mode ResultSet, to be populated by combining the ResultSets returned by the Index look-ups needed to respond to the query.

In DRIVER, pull-mode ResultSets are created by MDStore Services, Index Services, and IS-Store components.

ResultSets access requests

ResultSets can be accessed by consumer Services according to two different modalities: waiting and non-waiting; their responses may therefore vary depending on the typology and the status of the ResultSet.

·         Waiting modality: with this type of request the consumer waits for a result until it come or it is timed-out.

o        If the request can be satisfied before time-out, the ResultSet returns the result.

o        If the ResultSet is open and the time established for time-out elapses before the full result is available, the ResultSet returns the partial result (if any) and an OperationTimedOut message.

o        If the ResultSet is closed and the request specifies an interval of results that goes beyond the last position available, the ResultSet returns the partial result (if any) and an EndOfResult message.

·         Non-waiting modality: with this type of request the consumer expects an immediate result on request.

o        If the request can be immediately satisfied the ResultSet returns the result.

o        If the ResultSet is open and the result cannot be delivered at request time, consumers are returnedthe partial result (if any) and a ResultNotAvailable message.

o        If the ResultSet is closed and the request specifies an interval of results that goes beyond last position available, the ResultSet returns the partial result (if any) and an EndOfResult message.

ResultSet as Resources

ResultSets are not intended as discoverable and sharable Resources by default. Indeed, Services whishing to create a ResultSet in the form of a Resource, should first create it and then explicitly declare their intention to make it a Data Structure Resource.

ResultSet interface

The ResultSet Service interface offers the following methods:

o        createPushRS(expiryTime):EPR The method creates a push-mode ResultSet, whose expiry date is expiryTime and returns its EPR, containing the ResultSet Service address and the ResultSet identifier rsId.

o        createPullRS(dataSources,initialPageSize,expiryDate):EPR The method creates a pull-mode ResultSet, by specifying its data sources, its initial page size, and the expiry date and returns its EPR, containing the ResultSet Service address and the ResultSet identifier rsId. dataSources is a list of triples of the form (DataProviderServiceAddress, bdId, StyleSheet), relative to data provider Services, or pairs of the forms (EPR,StyleSheet), relative to ResultSets:

§         for data providers, the ResultSet invokes the getBulkData method at the right location DataProviderServiceAddress, over the specified bulk data bdId, and applies to all files the transformation StyleSheet;

§         for ResultSets, the ResultSet invokes the getResult method over the specified ResultSet EPR, specifying the StyleSheet into the call;[13]

§         PageSize is used to size the buffer of XML files extracted from the data sources; it can dynamically vary at run-time in order to adjust to the Consumer requests.

o        RSasResource(EPR, label) The method makes the ResultSet identified by EPR a Data Structure Resource, by publishing its profile (containing the label label) on the IS and by keeping the profile updated to run-time changes.

o        Active in push mode

§         populateRS(rsId,XMLFile[]) The method inserts a list of XMLFiles XMLFile[] into the ResultSet rsId.

§         removeResult(rsId,position) The method removes from the ResultSet identified by rsId the result at position position.

§         closeRS(rsId) The method declares the ResultSet closed.

o        deleteRS(rsId) The method removes the ResultSet rsId and all its content from the System.

o        getRSStatus(rsId):status The method returns the current status of the ResultSet rsId, which can be either open, i.e. the result is still under computation, or closed, i.e. all results have been computed.

o        getNumberOfResults(bdId):(int,status) The method returns the total number of list entries in the ResultSet rsId and the status of the result, which may be either open, i.e. the result is still under computation, or closed, i.e. all results have been computed.

o        getResult(rsId,fromPosition,toPosition,styleSheet,requestMode):XMLFile[ ] The method is used to retrieve XML files from the ResultSet rsId; the call specifies the interval fromPosition-toPosition, the stylesheet to be applied to all entries in the interval, and the request modality. The semantics of the response is illustrated above.

3.2      System Management

As specified in [1], DRIVER System functionality is provided by a number of cooperating Resources providing the functionalities specific to their Resource Type and to their Resource Kind. Furthermore, such Resources should communicate with each other according to DRIVER Authorization and Authentication mechanisms, based on the XACML standard architecture [15]. In the rest of the Section, Information Service and Manager Service are first introduced in the context of Resource management, and then described in detail, followed by the Authorization and Authentication Service.

3.2.1      Information Service and Manager Service: Resource Management

To become available, Resources must register and keep updated to the System their profile.[14] The System exploits profile information, i.e. the description of the current System status, to support two main parallel activities:

  • Functionality provision. The Resources currently available to the System interact with each other in order to provide DRIVER functionality. In particular, Resources interact for two main reasons:

o        A Resource requires specific functionality, potentially offered by Resources of a certain Resource Type; hence the Resource needs to discover such Resources, if they exist, and then interact with the Resource to exploit it; e.g. a Search Service needs to use Index Resources with specific features to answer a given query;

o        A Resource needs to be notified when an event that involves them directly, caused by the behavior of other Resources, occurs; in this case, the Resource needs to subscribe to the event of the target Resource, which should notify the former when the event takes place; e.g. an Index Resource needs to be notified when one of the MDStore Resources it targets is uploaded with new documents, in order to upgrade the Index structure.

Resources are not aware of the Resources available to the System in a certain moment, and rely on the System to discover the Resources they need to interact with or the one they should notify:

o        Resources needing to interact with a Type of Resource refer to the System to discover the available instances of such Resource Type; e.g. a Search Service executing a query needs to discover the available Index Resources and then interact with them.

o        Resources refer to the System to subscribe to specific actions, i.e. events, which may involve other Resources of the System. Similarly, a Resource expects the corresponding notifications from the System; e.g. an Index Resource subscribes to the System for the update action over the MDStore Resources it targets; the System notifies the Index Resource when such action occurs;

·         Orchestration. Service Resources and relative Data Structure Resources provide their functionality with a certain level of QoS, measured in terms of efficiency and efficacy. The System, monitors QoS by analyzing Resource profiles, and orchestrates Service Resources in order to improve such values by means of an Orchestration Protocol (see Section 3.2.3).

Under the name System Management are grouped the Services that provide support for the two activities described above.  Specifically, such activities are addressed by:

·         the Information Service (IS): communication Resource to Resource is possible thanks to the IS, which allows to query the pool of Resource profile and discover the needed ones. The IS also accepts subscriptions from Resources which are interested to a specific action of another Resource; and is in charge of notifying such Resources when the given action is fired.

·         the Manager Service (MS): the Service monitors and evaluates System QoS from the Resource profile information into the IS. The introduction, removal or modification of a Service Resource and of a Data Resource may impact on the QoS and cause the MS to react by redistributing System functionality. This operation requires the creation, deletion, and update of Data Structure Resources by explicit requests to the available Service Resources. Such Orchestration process is combined by the MS in cooperation with the Services and follows a specific protocol, described below (see Section 3.2.3).

In Figure 11 we present the expected relationships between a generic Service and the System Management Services. A Service communicates with the IS by interacting with:

·         the IS-PR (Profile Registry) component update the information about itself and the Data Structure Resources it creates; note that Service registration is demanded to Resource Managers, through IS-PR user interfaces;

·         the IS-SN (Subscription and Notification) component to subscribe to an asynchronous notification the Data Structure Resources when it creates them; the component will notify the Service when one of the actions to which its Data Structure Resources have subscribed occurs; note that, in order to support the Orchestration Protocol (see Section 3.2.3), a registering Service should subscribe itself to the update action over its own profile blackboard; this action is taken by the IS-PR on behalf of the Service;

·         the IS-LU (Look-Up) component to discover, i.e. search, information about the Resources it needs;

·         the IS-Store component, which supports an XML file storage layer for both the IS-PR and the IS-SN components, enriched with an XQuery search engine; the component is tied-up with a ResultSet component IS-Store ResultSet, which generates the ResultSet Resources corresponding to XQuery searches.

Figure 11. Generic Service interactions

Note how, the generic Service does not communicate with the MS, nor vice versa. As shown in Figure 12, the MS interacts only with the IS in order to monitor and possibly orchestrate the available Resources. The MS interacts with the IS with:

·         the MS-RM (Resources Monitoring) component to measure System QoS;

·         the MS-ResMan (Resource Management) offers two classes:

o        the MS-RO (Resources Orchestration) class implements the Orchestration algorithms necessary to maximize QoS;

o        the MS-RepMan (Repository Management) class offers the functionality for keeping Repository Service profiles upgraded into the IS;

·         the MS-NH (Notification Handler) component, which receives notifications from the IS about any creation, modification or deletion of Resources into the System.

Figure 12. System Management Services interaction

3.2.2      Information Service

In DRIVER Resources can interact directly or indirectly: in the first case, in order to accomplish its computational tasks, a Resource needs to interact with another Resource, of a given Resource Type; in the second case a Resource is notified of an action executed by another Resource.

An Information Service (IS) provides the functionality required by Resources to interact with each other, by offering to Resources mechanisms for:

  1. subscribing to specific actions of specific Resources, in order to be notified of the occurrence of such actions;
  2. finding the Resources, i.e. their profiles, they need to interact with.

Accordingly, the IS maintains:

  • a Subscription and Notification table, which associates subscribed Resources to specific topics, i.e. specific actions on Resource profiles and Resource Types; the subscription and notification table, as well as the relative managing interfaces, are built in order to support the OASIS Standards WS Base Notification 1.3 [12] and WS Topics 1.3 [13] (released the 1st October 2006).
  • an updated version of all Resource profiles, organized into Resource Kinds. Resource profiles must conform to a Resource Type, i.e. to the relative XML Schema, and are logically assigned to the Resource Kind of the Resource Type. Both information, i.e. Resource type and kind of a profile, must be described into the profile itself, hence described in the Resource Type XML schema. System Resource kinds are: ServiceResources, DRIVERPendingServices, HarvestingInstanceResources, IndexResources, MDStoreResources, RecommendationResources, ResultSetResources, UserResources, CommunityResources, CollectionResources.

The IS offers mechanisms to manage Resource Types, hence the addition and removal of pairs Resource Type and relative profile XML Schema. During the addition of a new Resource Type, the relative XML Schema must specify the association of the Resource Type with at least one Resource Kind and the subscription and notification table must be updated with the new topics, relative to the Resource Type.

3.2.2.1     Service architecture

In what follows a profile and a subscription should be intended as XML files conforming to specific XML Schema structure. A predicate over profiles or subscriptions is expressed as an XPath query.

Four physical components implement the DRIVER Information Service:     

·         IS-SN (Subscription and Notification) The IS-SN is responsible for accepting subscriptions from Resources and subsequently notifying them accordingly. Its interface conforms to the OASIS Standards WS Base Notification 1.3 [12] and WS Topics 1.3 [13].

According to the standard, NotificationConsumers need to be notified of the occurrence of certain Situations, i.e. events, in the System. To this aim, NotificationProducers monitor Situations in the System, generate the relative notification messages when these occur, and send them to NotificationConsumers. Since individual NotificationConsumers are not generally interested to all possible notifications, i.e. be notified of all Situations, the standards offer notification dispatching mechanisms based on the notions of topics and subscription to topics. Topics are exposed by a NotificationProducer as “labels” that uniquely identify a Situation; and can be organized into tree hierarchies, where a “topic node” is at least associated with the Situations of its topic node children. NotificationConsumers subscribe to some topics, i.e. some nodes of the topic tree, to signal their interest in the notifications relative to the associated Situations. Specifically, a subscription from one NotificationConsumer is an XPath query over the topic tree, which declaratively specifies the set of topics to be subscribed to.

Notification dispatching is based on two delivery strategies, chosen by NotificationConsumers at subscription time (see Figure 13):

o        Push policy: a NotificationProducer sends notifications directly to the interested NotificationConsumers by invoking the notify method exposed by the latter;

o        Pull policy: a NotificationConsumer creates a pull-point, which is a “resource entity” subscribed to receive all notifications of interest to the NotificationConsumer in a push fashion; the pull-point stores all notifications received; the NotificationConsumer will asynchronously request to the pull-point a number n of notifications with a getMessage(n).

 

Push policy

Pull policy

Figure 13 – Notification dispatching mechanisms

In DRIVER, “NotificationConsumers” are the generic Services, through the Notification Handler component, while “NotificationProducers” are the Information Services, through the IS-SN component.

DRIVER topics correspond to all possible actions over Resource profiles; as a consequence, the number of topics available is dynamic and depends on the structure and on the number of profiles that can be created, i.e. number of Resource Types, or deleted and updated, i.e. the number of Resource instances available in the System. In particular, actions, i.e. topics, regard the creation of a Resource profile of a certain Resource Type, the deletion of a specific Resource profile, and the update of a subpart of a specific Resource profile, to be identified by an XPath query. In DRIVER, such actions are encoded in an OASIS topic tree whose paths can be of the form:

·         CREATE.ResourceType

·         DELETE.ResourceProfile

·         UPDATE.ResourceProfile.ProfilePath

          where:

o        ResourceType is one of the Resource Types available into the System;

o        ResourceProfile is a value from the set of Resource Profiles available in the System;

o        ProfilePath is a sequence of labels that uniquely identifies the root element of a subtree in the XML tree of a profile; the profile tree must conform to the Profile Schema relative to the ResourceProfile type.

Services can subscribe to a set of topics of the topic tree by submitting to the IS an XPath query that “describes” the set. Typical examples are:

o        CREATE/* to subscribe to the action of creation-registration of any Type of Resource;

o        CREATE/RepositoryServiceType used by the MS to subscribe to the creation-registration of a Resource of type Repository Service Resource Type

o        UPDATE/RepID/*/ProvenanceFields, used by MDStore Resources to subscribe to the update of the DPF fields relative to the Repository Service RepID, to which they are associated. In such a way, MDStores will be notified only when the specific subtree of the Repository Service profile identified by the query and relative to Provenance Fields will be modified. 

NotificationConsumers, hence Services willing to use subscription and notification mechanisms, need to include a Notification Handler component exposing the standard method:

o        notify(subscriptionReference, topic, producerReference, message) The method sends to the Service the identifier SubscriptionReference of the subscription that generated the notification, the notification topic topic, the identifier of the IS that generated the notification, and a message message.

Notifications relative to queries regarding the root topic UPDATE will return in the message parameter the new subtree, i.e. the subtree that caused the update Situation. For the example of MDStore Resources above, the field will contain the new set of elements under the ProvenanceFields element. For the other forms of notification, the message parameter is left empty.

NotificationProducers, hence Information Services, need to include a component IS-SN exposing the standard interfaces described below.

 

IS-SN NotificationProducer

o        subscribe(ConsumerReference,topicExpression,InitialTerminationTime,SubscriptionPolicy):subscrId The method creates a new subscription and returns its identifier subscrId, where:

§         ConsumerReference is the identifier of the Resource that must receive the notification, i.e. a Service or a Pull-point Resource.

§         topicExpression is the topic expression of the subscription, i.e. an XPath query over the topic tree.

§         InitialTerminationTime is the expiration time of the subscription: it may be in absolute form, i.e. a date, relative form, i.e. time to elapse, or infinite.

§         SubscriptionPolicy is not used in the current DRIVER implementation.

The new subscription is stored locally into the IS-Store as an XML file representing the tuple <subscrId, ConsumerReference, topicExpression, InitialTerminationTime, topicSet, status>, where:

§         topicSet is the set of pairs <topic,lastKnownUpdate > such that topic is in the topic tree and matched by topicExpression.

§         lastKnownUpdate is completed only when topic is of the form UPDATE.resId.profPath; it contains the current portion of the profile of the Resource resId that matches profPath. This value is used by the IS-SN Engine component to check whether the update of a resource profile involves the subscription.

§         status indicates whether the subscription is active or paused: an active subscription is used by the IS-SN Engine component to generate the corresponding notifications, while a paused subscription is left out of the process; its initial value is “active”.

o        getCurrentMessage(Topic) The method may return the last notification generated for the given topic. It is a non-destructive read, i.e. the notification is not removed.

 

IS-SN SubscriptionManager Once subscriptions are created, Services can interact with them with the following methods:

o        renew(subscrId, TerminationTime) The method updates the termination time of the subscription identified by subscrId.

o        unsubscribe(subscrId) The method removes the subscription identified by subscrId.

o        pauseSubscription(SubscriptionId) The method “halts” the generation of notifications relative to the given subscription by changing its status parameter to “paused”.

o        resumeSubscription(SubscriptionId) The method “resumes” the use of the subscription for the generation of the relative notifications by changing its status parameter to “active”.

 

IS-SN PullPoint The class offers the methods to create and delete a Pull-point Resource, as well as those required to interact with it. Typically, a Service that requires a Pull-point policy notification mechanism:

1.      creates a Pull-point Resource;

2.      subscribes the Pull-point Resource to the topic of interest;

3.      interacts asynchronously with the Pull-point Resource.

 

o        createPullPoint(sId):ppId The method creates a new Pull-Point Resource relative to the Service sId and adds its profile to the IS. Only the Service identified by sId can manipulate the pull-point. The method is invoked by a Service willing to enforce a pull-point policy notification mechanism;

o        getMessages(ppId, MaximumNumber):notification[] The method returns a maximum number MaximumNumber of notifications accumulated into the Pull-point identified by ppId. The method is invoked by a Service, which must be the one that created the Pull-Point.

o        destroyPullPoint(ppId) The method removes the Pull-Point Resource identified by ppId from the IS. The method is invoked by a Service, which must be the one that created the Pull-Point.

 

IS-SN Engine The class offers the methods required by the IS-SN to manage DRIVER topics, to activate the generation of notifications, and to dispatch notifications to Pull-Point or Service Resources.

DRIVER topics are generally added and removed by the IS-PR component, in correspondence of the creation or removal of Resources; however, also System Administrators can modify the DRIVER topic tree, while adding, removing, or deleting a new type of Resource.

o        addCreateTopic(resType) The method adds a new topic  CREATE.resType to the topic tree, where resType is the unique name of a Resource Type in the System.

o        addResourceTopics(resId) The method adds to the topic tree a new topic  DELETE.resId and a set of new topic  UPDATE.resId.profPath, one for each path profPath in the profile of the Resource resId.

o        removeCreateTopics(resType) The method removes from the topic tree the topic CREATE.resType and all topics DELETE.resId and  UPDATE.resId.profPath such that the Resource identified by resId is of Resource Type resType.

o        removeResourceTopics(resId) The method removes from the topic tree the topic DELETE.resId and all topics UPDATE.resId.profPath.

All DRIVER topic events, which are catalogued by DELETE, UPDATE, and CREATE, are signalled to the IS-SN Engine component by the IS-PR component, which is responsible of profile management in the IS. The IS-SN Engine class offers the “internal” method to be invoked by the IS-PR component to fire notification generation: UPDATE topics need a special treatment in order to detect those subscriptions which are really touched by the profile update, while DELETE and CREATE are always followed by a notification. The method also sends notifications to Services or accumulates them into the local data structures associated to Pull-Point Resources.

o        actionPerformed(topicPrefix,brief) The method searches for the subscriptions <subscrId,resId,topicExpression,expTime,topicSet,”active”> whose topicSet contains a pair <topic,lastKnownValue> such that topic satisfies topicPrefix. For each  such pair, the method tries to construct and delivery a notification message as follows,

1.      if topicExpression is of the form UPDATE.resId.profPath:

a.      get the profile in brief and extract the subpart that satisfies the query profPath.

b.      if the extracted subpart is different from lastKnownValue then insert the subpart in message, go to step 3 and apply step 1 to next pair

c.      If the extracted subpart is equal to lastKnownValue then the topic was not affected by the update; apply step 1 to next pair; if none, terminate.

2.      if topicExpression is of the form CREATE.resType or DELETE.resId, generate an empty message and go step 3.

3.      if resId is relative to a Service, call the method notify(subscrId, topic, isId, message);

4.      if resId is relative to a Pull-point, store the notification message with the Pull-point Resource, in a local data structure.

Note that, the IS-SN notifies the MS of any action that may involve Service Resources, with a call: notify(SubscriptionReference, Topic, ProducerReference, Message).

·         IS-PR (Profile Registry) The component provides all methods necessary to support Resource Type and profile management. To this aim IS-PR interacts with the IS-Store component to memorize all the relative information as XML files file into file collections fileColl (see IS-Store component). Specifically:

o        One System file collection “DRIVERResourceTypes” is created to store all Resource Type XML Schemas as XML files with unique name resourceType;

o        One file collection for each System Resource kind is created, including DRIVERPendingService: profiles are saved into the file collection relative to their Resource Kind.

More specifically, the component gives support for:

1.      Resource Types management: creation and deletion of Resource Types, which are pairs <resTypeName,resProfSchema>; resProfSchema should include the XML schema part common to all Resource profiles, with the elements relative to the Resource identifier, Resource type, and Resource kind. In particular:

o        the name of the Resource Type resTypeName must also be a constant value appearing in the field ResourceTypeName of the profile schema resProfSchema.

o        the Resource kind should be provided as a constant value in the Resource profile schema resProfSchema, to reflect the link between a Resource Type and its associated Resource kind. Note that the Resource kind element relative to a Service Resource could also optionally be the System Kind “DRIVERPendingService”.

Resource Types are kept into the IS-Store file collection DRIVERResourceTypes.

2.      Resource profile management: Resource profiles are organized as pairs of the form <profileId,profile>. The component allows Resource Managers to register, update, and delete Service Resource profiles through a user interface; and Service Resources to register, update, and delete Data Structure Resource profiles through specific WS methods. Profiles conform to a Resource Type and are kept into the IS-Store file collection relative to the Resource kind associated with their Resource Type.

3.      The component also handles Service profiles in the status of “pending”, by temporarily associating them to the System Kind “DRIVERPendingService” and therefore keeping them into the relative IS-Store file collection; the possible action of “validation” of a Service profile, will change the profile Resource kind from “DRIVERPendingService” into “ServiceResourceKind”, thereby making it available to System usage.

Resource Type management

o        addResourceType(resourceType, profileXMLSchema) The method adds a new Resource Type into the System, by providing the name resourceType (unique in the System) and its XML schema profileXMLSchema, i.e. the structure of the relative profile. To this aim, the method:

§         stores the new XML schema with a call  insertXML(resourceType,“DRIVERResourceTypes”,profileXMLSchema) to the IS-Store component;

§         in correspondence of a new Resource Type and corresponding XML Schema, new topics must be made available for subscription; topics will be created with the call  addCreateTopic(resourceType) to the IS-Engine component.

o        deleteResourceType(resourceType) The method removes the specified Resource Type from the System. Accordingly, its related topics, XML Schema, and profiles must be removed. To this aim, the method respectively invokes:

§         the method removeCreateTopic(resourceType) of the IS-Engine component

§         the method deleteXML(resourceType,“DRIVERResourceTypes”) of the IS-Store component,

§         the method deleteXML(fileName,resourceKind) for all files  file identified by the pair <fileName,resourceKind> such that resourceKind is the Resource Kind associated with resourceType and file has the attribute ResourceType set to  resourceType.

o        getResourceTypeSchema(resourceType):XMLSchema The method is for internal use and retrieves the XML Schema relative to the Resource Type resourceType by a call getXML(resourceType,“DRIVERResourceTypes”) to the IS-Store component.

Resource profile management

o        insertServiceProfileForValidation(profile,resourceType):profId The method checks if profile matches the structure of resourceType’s XML Schema; if successful, it puts the profile into the Registry with System Kind DRIVERPendingService. To this aim:

§         it retrieves the XML schema relative to the Resource Type passed as parameter, with a call getResourceTypeSchema(resourceType)

§         matches the profile against the schema;

§         if the match is positive, it creates a fresh name fileName for the profile by a call getNewFileName(“DRIVERPendingService”) to the IS-Store, and creates a new profile identifier profId by calling the internal method createProfileId(fileName, “DRIVERPendingService”);

§         it modifies profile by adding profId as value of the field ServiceIdentifier and DRIVERPendingService as value of the field ResourceKind,

§         invokes insertXML(filename,“PendingServiceResources”,profile) of the IS-Store component.

§         returns profileId as result.

o        validateProfile(profId) The method “validates” the profile with the identifier profId, i.e. makes it available to System usage. Validation corresponds to changing the Resource Kind of the profile from PendingServiceResources into the Resource Kind ServiceResource. To this aim, the method:

§         decodes the profile identifier with a call  decodeProfileId(profId) to get the relative file collection PendingServiceResources  and file name fileName;

§         retrieves the profile profile with a call getXML(PendingServiceResources,fileName)

§         deletes the profile from “PendingServiceResources” with a call deleteXML(PendingServiceResources,fileName)

§         updates the profile with the new Resource Kind ServiceResource, gets from the profile its Resource Type resourceType, and calls registerProfile(profile,resourceType) to insert the profile into the Registry;

§         finally, the method is also in charge of subscribing the newly registered Service Resource to the update of its own profile with a call subscribe(id,UPDATE.id.blackboard, infinite,-).

o        registerProfile(profile,resourceType):profId The method registers the profile passed as parameters, i.e. puts it into the Registry and creates the topics relative to the Service instance. To this aim the method:

§         creates a new profile identifier profId by calling the internal method createProfileId(fileName,resourceKind), where fileName is in turn returned by the call getNewFileName(resourceType) of the IS-Store and resourceKind is derived by the profile itself.

§         adds profId to profile, as value of the field ServiceIdentifier;

§         retrieves the XML Schema corresponding to resourceType and checks whether profile conforms to it; if so:

§         invokes insertXML(filename,resourceKind,profile) of the IS-Store component.

§         grows the topic tree with a call addResourceTopics(profId)

§         finally, the method interacts with the IS-SN Engine component to activate the notification mechanism with a call actionPerformed(CREATE.resourceType, id)

§         returns profileId as result.

o        deleteProfile(profId) The method removes the profile with the identifier passed as parameter from the Registry, interacts with the IS-SN Engine to notify interested Resources of the event, and updates the topic tree. To this aim, the method:

§         decodes the profile identifier with a call  decodeProfileId(profId) to get the relative file class resourceKind and file name profile;

§         removes the profile profile from its file collection with a call deleteXML(resourceKind,profile)

§         activates the notification mechanism with a call actionPerformed(DELETE.id, -) and prunes off the topic tree with a call removeResourceTopics(id).

o        updateProfile(profId,profile) The method replaces the content of the profile identified by profId with the content of profile and interacts with the IS-SN Engine to notify interested Resources of the event. To this aim, the method:

§         decodes the profile identifier with a call  decodeProfileId(profId) to get the relative file class resourceKind and file name fileName;

§         invokes the method updateXML(fileName, resourceKind,profile)

§         activates the notification mechanism with a call actionPerformed(UPDATE.id, profile) to the IS-SN Engine component.

Internal methods

o        createProfileId(part1,part2):profId The internal method returns an identifier that must be unique with respect to the pair <part1,part2>, i.e. no other different pair of values can produce the same identifier, and the function must be invertible;

o        decodeProfileId(profId):<part1,part2> The internal method returns the pairs of values associated to the profile identifier profId.

·         IS-LU (Look-Up) The IS-LU is called by Service Resources to search for Resource profiles satisfying certain properties. Look-ups can be of the direct “get” form, returning the profile with the given identifier, or more sophisticated searches, declared as XQuery queries. Search queries are run over an XML database derived from the file collections to be found in the IS-Store. The entry point of the database is by default a <DRIVER> element with the following sub-elements, one for each System Resource Kind: ServiceResources, DRIVERPendingServices, HarvestingInstanceResources, IndexResources, MDStoreResources, RecommendationResources, ResultSetResources, UserResources, CommunityResources, CollectionResources.[15]

The component also offers a number of search syntactic-sugar methods, targeting specific look-up needs.

o        searchProfile(XQuery):ResultSet The method returns the ResultSet returned by the execution of the method searchXML(XQuery) of the IS-Store components.

o        getResourceIDs(resourceType): ResultSet The method returns the ResultSet identifier returned by the execution of the method searchXML(XQuery) of the IS-Store components, where XQuery is the XQuery query that searches for all profiles identifiers of the given Resource Type.

o        getResourceProfile(id): profile The method returns the profile identified by the Resource identifier id from the IS-Store component; to this aim:

§         decodes the profile identifier with a call  decodeProfileId(profId) to get the relative file collection  fileColl and file name fileName;

§         retrieves the profile profile with a call getXML(fileColl,fileName)

o        getResourceConfigParam(id): XMLString

o        getResourceStatusParam(id): XMLString

o        getResourceQoSParam(id): XMLString

o        getResourceAuthZParam(id): XMLString

·         IS-Store The IS-Store component supports an XML file store capable of storing XML files fileName into file collections fileColl. The components allows for the creation and deletion of file collections with a unique name fileColl, and the insertion, removal, or update of XML files with name fileName for a specific file collection; note that the name of a file must be unique w.r.t. the file collection it is part of, hence files are uniquely identified by a pair <fileName,fileColl>. The creation of collections and insertion of files forms the following XML database:

o        the entry point is a <DRIVER> XML element (“DRIVER element” in the following);

o        the DRIVER element has one sub-tree element for each file collection created;

o        each file collection element has one sub-tree element for each file name inserted into the collection

o        each file name element contains the XML relative to the XML file itself.

The component provides management methods for XML files and file collections, in addition to XQuery search over the formed XML database. In particular, the IS_store is a data provider Service, as described in Section 3.1. Accordingly it implements the relative bulk data interface, which will be used in the definition of the following methods.  

o        createFileColl(fileColl) The method creates a new file collection fileColl, i.e. adds a new sub-element <fileColl> under the element DRIVER in the database.

o        deleteFileColl(fileColl) The method removes the file collection fileColl, together with all XML files into the collection; i.e. removes the sub-element <fileColl> under the element DRIVER in the database, together with all its sub-trees.

o        insertXML(fileName,fileColl,file) The method puts the XML file file with name fileName into the collection fileColl; i.e. it adds a sub-tree fileName/file under the path DRIVER/fileColl of the database.

o        getXML(fileName,fileColl):file The method retrieves from the store the file identified by the pair <filename,fileColl>, and returns it as result;

o        deleteXML(fileName,fileColl) The method removes from the store the file identified by the pair <fileName,fileColl>; i.e. it removes the sub-tree fileName/file under the path DRIVER/fileColl of the database.

o        updateXML(fileName,fileColl,file) The method replaces the XML file identified by <fileName,fileColl> with the file file.

o        searchXML(XQuery): rsId The method creates a bulk data reference bdId with a call generateBulkData(XQuery), creates a pull-mode ResultSet rsId with a call createPullRS(bdId,pageSize,expiryDate), and returns it as the result; note that the call generateBulkData runs the query XQuery over the database and associates its result to the buld data reference bdId.

o        getNewfileName(fileColl):fileName The method returns a new fileName for the given fileColl.

o        getFileColls():fileColl[] The method returns the list of file collections available into the System.

·         IS-Store ResultSet The IS-Store ResultSet component offers all methods described in Section 3.1.

3.2.2.2     Interactions

IS-PR

·         The IS-PR is accessed by all Service Resources that need to register or delete a Data Structure Resource or that need to update their own profile or the profile of one of their Data Structure Resources. The component interacts with the IS-Store and encodes profiles and actions over them into XML files and relative operations.

·         The registration of a new Service Resource requires the IS-PR to interact with the IS-SN, in order to register the Service to the update action relative to its own profile.

·         The component offers a user interface, through which Resource Managers can register new Service Resources and System Administrator can validate their registrations.

IS-LU

·         The IS-LU is called by Service Resources to search for Resource profiles satisfying certain properties. The components interact with the IS-Store to answer the searches. The IS-LU either returns one XML file, in response to a direct access request, or a ResultSet containing the list of results.

IS-SN

·         The IS-SN is called by all Service Resources to register themselves or their related Data Structure Resources to specific actions.

·         The components interacts the Notification Handler components of DRIVER Service Resources, when these are to be notified of a subscribed action.

·         The IS-SN interacts with Service Resources that need to register a subscription. The component interacts with the IS-Store and encodes subscription and relative actions into XML files and operations.

IS-Store

·         The IS-Store is an internal to the IS and is called by the IS-LU, the IS-PR, and the IS-SN components in order to store and retrieve XML files relative to Resource profiles and subscriptions. The IS-Store either returns one XML file, in response to a direct access request, or a ResultSet containing the list of results.

3.2.2.3     Detailed design

The IS components shall be designed and implemented as a peculiar set of DRIVER Service Resources, limited in usage to IS Resources themselves. IS Resources can be replicated and run on different IS-Nodes, which are machines running a Web Service container written in Perl. Replication supports System robustness and scalability.

The IS-node WS container runs a Communication Layer Service, which unburdens local IS Resources from accessing the IS in order to search for the IS Resources they need in the System. As illustrated in Figure 14, the Communication Layer interacts directly with the IS-Store in order to cache the profile information needed by the Resources local to the container. Caching is performed asynchronously with respect to the IS Resources searches. Local IS Resources can therefore directly communicate with the Type of IS Resource they require and expect the Communication Layer to redirect their message to the most appropriate Resource.

Figure 14. Information Service – Detailed specification

3.2.3      Manager Service

The MS monitors System QoS by elaborating the Resource profiles into the IS. QoS depends on efficacy and efficiency:

  • Efficacy: it is measured according to two System aspects:
    • Minimum number of Service Resources available for each Resource Type (including replicas);
    • Number of Data Structure Resources needed for the System to work for each Data Structure Resource Type: such numbers depend on specific System management policies described below.
  • Efficiency: efficiency depends on Resource workload, the System calculates it from the performance parameters updated by the Resources themselves and reacts based on the System management policies described below.

Since the System cannot create Service Resources, the MS may improve efficiency and enforce efficacy, by creating, deleting, or updating Data Structure Resources. Specifically, the MS orchestrates Resources by interacting with the Resource profiles into the IS according to the Orchestration protocol described below.

System management policies

DRIVER System logic is based on the following policies, which must be enforced in order to provide the required efficacy threshold:

  • Efficacy
    • For each Repository Service there must be
      • One Harvesting Instance
      • One Master MDStore for the harvesting of Dublin Core metadata format records
      • A number of MDStores, one for each non Dublin Core metadata format exported by the Repository
    • Each DRIVER record into an MDStore Resource must be targeted by at least one Index Resource.
  • Efficiency
    • Distribute the number of Data Structure Resources among the available Service Resources than can handle them. Such policy must be enforced based on thresholds and cost trade-offs.

Each of the following subsections presents a System Management Service, reporting a brief description and its components. Hereafter, we also highlight the most important interactions that emerge from Figure 12.

Orchestration Protocol

The Orchestration Protocol is based on the following two assumptions:

  1. Service Profiles contain a section dedicated to MS messages, called blackboard (see Section 2.5.3.1).
  2. Services, at registration time, also subscribe to the update of their own profile blackboard (see top picture in Figure 15);
  3. the MS is subscribed to the update of all Service profiles blackboard.

Figure 15 illustrates the steps of the Orchestration Protocol. When the MS decides to communicate with a Service it updates the relative profile blackboard in the IS with a message (step A). The relative Service (step B), is notified of the change with a notify(subscrId,UPDATE.serviceId.blackboard,isId,message) to its Notification Handler component. The message parameter contains the blackboard of the Service profile, hence the message to be interpreted.

Figure 15. Manager Service Orchestration protocol

The MS may also need to send an action to a Data Structure Resource, in order to change its behaviour. In this case, the action will be sent to the Service responsible for the action, which will execute it over the target Data Structure.

When Services are delivered an action they must answer to that action with a further profile blackboard update. This will allow the MS to check whether the original action was successful or not, and activate a repairing process if necessary.

MS actions and Service answers are triples of the form (actionId, subject, parameters), where:

  • actionId is the unique identifier relative to the action: it is released by the MS and used to associate the answers left by the Service to the action that caused them;
  • subject can be relative to an action, such as CREATE, DELETE, UPDATE, MANAGE, RELEASE or CANCEL ACTION, or relative to an answer, such as ONGOING and DONE. ONGOING is the answer to actions that may take a long time, such as the creation of an Index Resource; at operation termination the Service leaves a message DONE to the action. In those cases where the action is taking too long, the MS may decide to send  CANCEL ACTION action, which should fire a recovery operation at the Service side;
  • parameters may be required in an action to either specify the parameters needed to create or update a Data Structure Resource and to specify the identifier of the Data Structure to be removed; or in an answer to return the identifier of the Data Structure created with an action.

3.2.3.1     Service architecture

Three physical components implement the DRIVER Manager Service:

·         the MS-RM (Resources Monitoring) component retrieves Resource profiles from the IS and measures System QoS; such activity is run “continuously”, i.e. everytime a Service Resource profile is modified, and consists in evaluating Service performance values:

o        monitorResources()The method implements the efficiency policies described above;

·         the MS-ResMan (Resource Management)

 

MS-RO (Resources Orchestration) The component implements System management policies based on System Resources change and current QoS. Its Orchestration algorithms exploit the Orchestration Protocol described above. The component may send warning to System Administrators when there is no space for automatic adjustment, e.g. when Service workload is too high and a new Service instance would be required. The methods exposed are the following:

o        orchestrateResources(action) The method implements the efficacy policies described above, by creating, updating, or deleting Data Resources according to the System needs; in order to do so, it interacts with the IS-LU component, through the method searchProfile, and with the IS-PR component, through its methods registerProfile, updateProfile, and deleteProfile.

 

MS-RepMan (Repository Management) The component is in charge of updating Repository Service profiles in the IS whenever new changes occur. Typically, the upgrade relative to a Repository Service takes place after each harvesting operation from the Repository itself.

·         the MS-NH (Notification Handler) component, which receives notifications from the IS about any creation or deletion of Resources into the System:

o        notify(subscribedResourceId,topic,is_id,message) The method checks the topic topic passed as parameter and behaves as follows: if the topic regards an UPDATE, DELETE, or CREATE, it calls the method orchestrateResources(); if the topics regards the UPDATE of a Resource, it calls the monitorResources.

Orchestration Policies

The method orchestrateResources(action) operates as a scheduler of actions by executing the orchestration procedure determined by the action at hand. Specifically, the following orchestration policies must be enforced:

CREATE Repository ID

The MS creates a new Harvesting Instance Resource, to be associated to the Repository. To this aim, a Master MDStore must be created, together with one MDStore for each of the non-Dublin Core metadata formats exported by the Repository. The MS:

  1. retrieves the Repository Profile from the IS and gets the DPF record and the list of available formats;
  2. retrieves the System Configuration Resource Profile from the IS and gets the Index Configuration Fields
  3. it searches in the IS for all the profiles of the MDStore Services available and chooses the ones that may host the MDStores;
  4. updates the MDStore Service profiles with the required actions:

(actionId,CREATE, Parameters), where

·         actionIdentifier is a unique identifier for the action, and

·         parameters are: DPF record, Index Configuration parameters, Metadata Format, Master MDStore flag, Repository id;

  1. waits for notification message relative to the topic UPDATE.mdsId.blackboard:
    1. if it does not arrive after a certain time-out, the MS updates the MDStore Service profile by removing the action (actionId,CREATE,Parameters) and searches for another available MDStore Service;
    2. if it does arrive and the message parameter contains the action (actionId,ONGOING,-), it moves to the next step;
  2. waits for notification message relative to the topic UPDATE. mdsId.blackboard:
    1. if it does not arrive after a certain time-out, the MS updates the MDStore Service profile by adding the action (actionId, CANCEL ACTION,-) and searches for another available MDStore Service;
    2. if it does arrive and the message parameter contains the message (actionId,DONE,MDStore id), the MS performes the following step
  3. it registers a new Harvesting Instance Resource for the Repository, by using all MDStore ids found in the MDStore Service profiles;
  4. it searches in the IS for an Aggregator Service that is available to manage the new Harvesting Instance Resource;
  5. updates the Aggregator Service profile with the action:

(actionId,MANAGE, Harvesting Instance ID)

  1. waits for notification message relative to the topic UPDATE.asId.blackboard:
    1. if it does not arrive after a certain time-out, the MS updates the Aggregator Service profile by removing the action (actionId, MANAGE, Harvesting Instance ID) and searches for another available MDStore Service;
    2. if it does arrive and the message parameter the action (actionId,DONE,-), the MS has concluded the orchestration.

UPDATE MDStore ID

A Master MDStore may update its profile because of a harvesting operation or because of a refresh operation. In such a case, new harvested records may have brought into the MDStore new values relative to the Index Configuration Fields. If so, the following System Management policies are violated:

  1. the originating Repository Service has surely changed its profile, but the change is not propagated to the IS;
  2. DRIVER Information Space has new harvested records, but these are not targeted by any Index Resource.

In order to solve the first problem, the method invokes getRepositoryStats(r_id) from the AS-RIP component, gets the required information and updates the Repository Service profile in the IS accordingly.

To solve the second problem, the MS has to initiate the following “indexing transaction”, based on the MDStore and the new values added to the Index Configuration Fields:

·         check if the Index Resources currently assigned to MDStore are enough to preserve System Management Policies

  1. retrieves the MDStore profile and gets the for each Index Configuration Parameter f the set of values Vf harvested so far
  2. it searches for all Index Resources profiles that target the MDStore
  3. given all Index Resource profiles, the MS gets the Index Configuration Fields f and the relative values Vf assigned to them
  4. for each Index Configuration Fields f, it checks if Vf  is a subset of Vf , i.e. if all records harvested in MDStore can be targeted by the Index Resources currently assigned to MDStore
  5. if this is the case, the operation is concluded

·         search for Index Resources available, not currently targeting the MDStore, which may be used to enforce System Management Policies

  1. if, instead, some values in Vf are not in Vf , the MS searches for an existing Index Resource not yet assigned to MDStore, but available to target values in Vf  not in Vf
  2. if such Index Resource does not exists, it moves to the step 10, otherwise the MS updates the Index Profile blackboard with the action (actionId,UPDATE, <idxId, addMDStore>)
  3. waits for notification message relative to the topic UPDATE.idxId.blackboard
    1. if it does not arrive after a certain time-out, the MS updates the Index Resource profile by removing the action (or the actions) and searches for another available Index Service;
    2. if it does arrive and the message parameter the action (actionId,ONGOING,-), it moves to the next step;
  1. waits for notification message relative to the topic UPDATE.idxId.blackboard:
    1. if it does not arrive after a certain time-out, the MS updates the Index Resource profile by adding the action (actionId, CANCEL ACTION,-) and searches for another available Index Service;
    2. if it does arrive and the message parameter contains the message (actionId,DONE,-), the operation has concluded.

·         Create a new Index Resource for MDstore

  1. it searches for an Index Service capable of hosting a new Index Resource;
  2. it gets the System Configuration Resource Profile and retrieves the Index Configuration Fields from it
  3. it updates the Index Service profile with an action (actionId,CREATE, parameters), where the parameters include the MDStore id, and the Index Configuration Fields set to the values in Vf
  1. waits for notification message relative to the topic UPDATE.ixsId.blackboard
    1. if it does not arrive after a certain time-out, the MS updates the Index Service profile by removing the action (or the actions) and searches for another available Index Service;
    2. if it does arrive and the message parameter the action (actionId,ONGOING,-), it moves to the next step;
  1. waits for notification message relative to the topic UPDATE.ixsId.blackboard:
    1. if it does not arrive after a certain time-out, the MS updates the Index Service profile by adding the action (actionId, CANCEL ACTION,-) and searches for another available Index Service;
    2. if it does arrive and the message parameter contains the message (actionId,DONE,Index Id), the operation has concluded.

3.2.3.2     Interactions

MS-RM

·         The MS-RM interacts with the IS-LU component of the IS in order to get the profiles needed to calculate QoS of one Service Resource or of a set of Service Resources.

·         The component communicates to the MS-RO when the measured QoS is below the acceptance thresholds.

MS-ResMan - MS-RO

·         The MS-RO receives notification messages from the MS-NH component and from the MS-RM component.

·         It communicates with the IS-PK component of the IS to modify Resource profiles according to the Orchestration protocol.

MS-ResMan - MS-RepMan

·         It interacts with the AS-RIP component in order to get the most recent information about a given Repository Service

·         It interacts with the IS-PR component in order to retrieve and to update the profile of Repository Services.

MS-NH

·         It receives messages from the IS-SN component of the IS.

·         It then transmits them to the MS-RO component.

3.2.3.3     Detailed design

To be completed.

3.2.4      Authentication & Authorization Service

The DRIVER Authentication and Authorization Service (AAS) provides the functionality required to enforce security over the System by the means of authentication and authorization mechanisms.

The architecture of DRIVER Authentication and Authorization infrastructure grounds on the eXtensible Access Control Markup Language (XACML) standard[16]. XACML addresses the issues of authorization in distributed, heterogeneous, enterprise scale Systems – which very well characterizes DRIVER environment. XACML defines access control rules by means of security polices (or shortly policies). A security policy is a logical rule that states the validity of a given a situation, i.e. a requester, a state of application, an accessed Resource, and an action to be taken.

Specifically, the XACML standard defines the following aspects:

  1. syntax of security policy definition;
  2. semantics of security policy evaluation;
  3. components of generic XACML-compliant architecture;
  4. protocol used by the architecture for security enforcement;

DRIVER AAS functionalities are based on the first three aspects of the standard. For the sake of simplicity, we do not refer directly to the XACML protocol syntax and use a simplified syntax.

In the following sections, syntax and semantics of security policies is presented, together with the XACML generic architecture and its adoption in DRIVER.

Security domain overview

In DRIVER, two types of Resources are subject to security aspects:

l        Users: Users need to authenticate, i.e login, to be granted identity and be authorized to access personal information and interact with Services;

l        Services: Services need too authenticate, to prove to the System they are what they claim to be; additionally, some of the functions of those Services are a subject to the authorization, e.g. Manager Service, Recommendation Service, etc;

Users and Services, willing to actively participate in the System, shall traverse the following phases, defined by AAS component:

  1. Initialization Resources initialize their safe interaction with the System by authenticating, i.e. Resource registration time for Service Resources and login time for User Resources. This step requires the Resource to authenticate in the System against the credentials held by IS. Successful authentication results in the creation of a Security Context (SecCTX), uniquely assigned to the Resource Profile of the authenticated Resource. If the authentication process fails, other Service Resources are not allowed to interact with the Resource whenever interaction requires security constraints.

·         Interaction Resources interact with other Resources, i.e. invoke methods exposed by other Service Resources. Some of these methods, i.e. operations, might require authorization.[17] Resource SecCTX’s, released at authorization time, are used to verify authorizations.

·         Invalidation When the Resources are deactivated or removed from the System, because of malfunctioning, they return to a non-initialized state.

From the perspective of security, Resource life-cycle is modelled by the state machine presented on Figure 16. For Service Resources the init state is part of the registration process, while for User Resources is part of the login process.

Figure 16 - Resource AA state machine

Security Context

A Security Context (SecCTX) is a container of information related to a Resource that passed the authentication mechanism. Specifically, the AAS releases one SecCTX for each Resource that successfully authenticates. The data model of SecCTX consists of a following set of attributes:

·         Id: security context id;

·         Pubkey: public key generated by AAS to be used by the Resource;

·         Privkey: private key generated by AAS to be used by the Resource and paired with pubkey;

·         Lat: last access time;

·         Identities: list of identities assigned to the Resource; identities are assigned based on the result of the authentication process. This can be for example: roles granted to Resource, specific certificates, external SSO session identifiers, external user identifiers, e.g. roles, Resource identifiers, etc.

·         Obligations: list of the identifiers of the obligation policies attached to the Resource;

·         Attributes: a map of generic string valued attributes that can be set during the authentication and authorization processes.

SecCTX’s are stored inside the IS as a particular Data Structure Resource, private to the AAS’s.  SecCTX’s are removed from the System when the respective Resource become inactive, i.e. Users log out or Service Resources are removed from the System.

Security Profile

Each Resource is associated with specific security information, to be used at authentication time for the creation of the respective SecCTX. This information is materialized into a so-called Security Profile (SecPROF), to be kept into the IS and associated to the relative Resource. SecPROF holds the definition of Resource identities, credentials, and specific attributes required for the authorization process of the Resource. More specifically:

·         SecPROF identifier: identifier of the SecPROF in the IS

·         Resource Identifier: identifier of the Resource associated to the SecPROF

·         Identities: list of identities that the Resource can be granted along the required credentials;

·         Obligations: list of the identifiers of the default obligation policies that should be attached to the SecCTX of this Resource. An obligation policy is a specific Security Policy (described below);

·         Attributes: a map of a generic string valued attributes that can be used during authentication and authorization processes. This attribute are an extension point in the authentication and authorization mechanism, if some additional AA scheme has to be included in the System.

The data model and connections between Resource Profile, Security Profile and Security Context are illustrated in Figure 17.

 

Figure 17 - Resource security profile data model

User Resources and Service Resources always have a Security Profiles. For other types of Resources this information is not required, unless a specific external requirement for authorization is defined (which my happen in further stages of DRIVER implementation).

Examples of Security profiles are the following:

1           An example information held by SecPROF for User Resource is :

  • identities: [(login: user,password: 43b71ff1d03628bfa55758f7582b0db0), (role: admin)]
  • attributes:[(key: passwordType, value: MD5)

2           An example information held by SecPROF for Service Resource is :

  • obligations: [ alwaysLogPolicy, recreateKeys]
  • attributes:[(key: keyTimeout, value: 1800)]
Security Policy</