The first version of the Pisa syntactic-semantic parser was described in detail in Deliverable 4, Section 2 and Appendices 2,3, and 4. The scope of this report is to discuss the testing of the parser on the sample set of vocabulary which has been selected from the ITU Corpus (see Deliverable 6.1) and to illustrate the revisions and extensions that are now being implemented. The report therefore concentrates on presenting analysis and extraction activities. We need to specify clearly all the kinds of information that we can extract from the Cobuild definitions before completing the description of the type system that will be used to represent them (to appear in Deliverable 7). Our parser takes as input the syntactically parsed definitions from Birmingham (referred to as the Birmingham input from now on) and analyses them, using complex pattern matching techniques, in order to derive and extract syntactic and semantic information. While the testing of the first version has confirmed the validity of the core procedures, it is clear that a strategy based on string matching must be tested over a relatively large sample of data before we can identify all the potentially significant markers that permit us to extract meaningful information. This means that, at least in the early stages, each time we test the parser over new samples of definitions, we expect to have to add to the basic set of rules. This report must thus be considered a description of work in progress. When discussing the changes that are now being implemented, continual reference will be made to the description of the first version of the parser presented in Deliverable 4, and to the templates used to represent the information extracted from the definitions. Examples of the new revised templates are given in the Appendix.
