DB Group @ unimo 1st International Workshop on Interoperability through Semantic Data and Service Integration 25 June 2009 Camogli, Italy Searching for data and services F. Guerra1, A. Maurino2 , M. Palmonari2, G. Pasi2 , A. Sala3 1DEA - Università di Modena e Reggio Emilia, v.le Sarca 336, Milano, Italy 2DISCO - Università di Milano Bicocca, v.le Risorgimento 2, Bologna, Italy 3DII - Università di Modena e Reggio Emilia, via Vignolese 905, Modena, Italy 1 DB Group @ unimo Outline 1. 2. 3. 4. Motivation Building the Global Data and Service View at Set-up Time Data and eService Retrieval Conclusion and future work 2 DB Group @ unimo Motivation • The research on data integration and service discovering has involved from the beginning different (not always overlapping) communities. – • • • Data and services are described with different models, and different techniques to retrieve data and services have been developed. From a user perspective, the border between data and services is often not so definite, since data and services provide a complementary vision about the available resources. Users need new techniques to manage data and services in a unified way. Integration of data and services can be tackled from different perspectives. – – Access to data is guaranteed though Service Oriented Architectures (SOA), and Web services are exploited to provide information integration platforms; Providing a global view on the data sources and on eServices available in the peer to support the access to the two complementary kinds of resources at a same time. 3 DB Group @ unimo Motivation (2) Select Name, Country from Accommodation Where City=’Modena’ The problem we address in is to retrieve, among the many services available, the ones that are related to the query, according to the semantics of the terms involved in the query. 4 DB Group @ unimo The approach (overview) • • We assume to have a mediator-based data integration system which provides a global virtual view of data - the Semantic Peer Data Ontology (SPDO). We assume to have a set of semantically annotated service descriptions. – • We propose a semantic-based approach to perform data and service integration: – • Ontologies used in the service descriptions can be developed outside the peer and are not known in advance, in the integration process. given a SQL- like query expressed in the terminology of the SPDO, retrieve all the services that can be considered “related” to the query on the data sources. The approach developed is based on: – – a mediator-based data integration system, the MOMIS system (Mediator envirOnment for Multiple Information Sources); a service retrieval engine based on IR techniques performing semantic indexing of service descriptions and keyword-based semantic search. 5 DB Group @ unimo The approach (overview) • The integration of data and services is achieved by: 1. building the SPDO (a functionality already provided by MOMIS), 2. building a Global Service Ontology (GSO) consisting of the ontologies used in the service semantic descriptions, 3. defining a set of mappings between the SPDO and the GSO, 4. exploiting, at query time, query rewriting techniques based on these mappings to build a keyword-based query for service retrieval expressed in the GSO terminology starting from a SQL-like query on the data sources. 6 DB Group @ unimo Building the Global Data and Service View The SPDO is built by exploiting the MOMIS integration system The global light service ontology is built by means of the following steps: Service indexing, Global Service Ontology (GSO) construction, Global Light Service Ontology (GLSO) construction and Semantic Similarity Matrix (SSM) definition. 7 DB Group @ unimo MOMIS 8 DB Group @ unimo Service Indexing • Our approach requires a formal representation of the service descriptions and it is based on full text indexing which extracts terms from six specific sections of the service description: – – – – – – • A set of index terms I that will be part of the dictionary is extracted. – – • service name, Service description, input, output, pre-condition post-condition IO= the set of index terms consisting of ontology IT = the set index terms extracted from textual descriptions The indexing structure is based on a “structured document” approach, where inverted file structure consists of: – – a dictionary file based on I, a posting file, with a list of references to the services’ sections where the considered term occurs 9 DB Group @ unimo GSO construction • The GSO is built by: – – • loosely merging each service ontology O such that i belongs to O for some i in IO associating a concept Ci with each i in IT, introducing a class Terms subclass of Thing in the GSO and stating that for every i in IT, Ci is subclass of Terms “loosely merging” means that SOs are merged without attempting to integrate similar concepts across the different integrated ontologies. – – – if the source SOs are consistent, the GSO can be assumed to be consistent Loose merging is clearly not the optimal choice with respect to ontology integration Since the XIRE component is based on approximate IR techniques and semantic similarity, approximate solutions to the ontology integration problem can be considered acceptable; instead, the whole GSO building process need to be fully automatized. 10 DB Group @ unimo GLSO construction and Semantic Similarity Matrix • The GSO may result extremely large in size: only a subset of the terms of the ontologies are relevant to the SWS descriptions. – – • a technique to reduce the ontology size is exploited and a GLSO (Global Light Service Ontology) is obtained. We extract from the GSO, the subontology that preserves the meanings of the terms explicitly used in the service descriptions, namely, the set of the index terms I. The Semantic Similarity Matrix (SSM), which is exploited later on for query expansion at query time, is computed. – The SSM is defined by analyzing the GLSO structure, according to some semantic measure developed in literature and takes into account subclass paths, domain and range restrictions on properties, membership of instances, and so on. 11 DB Group @ unimo Mapping of Data and Service Ontologies • • • Mappings between the elements of the SPDO and the GLSO are generated by exploiting and properly modifying the MOMIS clustering algorithm. The clustering algorithm takes as input the SPDO and the GLSO with their associated metadata and generates a set of clusters of classes belonging to the SPDO and the GLSO. Mappings are automatically generated exploiting the clustering result. – – – A cluster contains only SPDO classes: it is not exploited for the mapping generation; this cluster is caused by the selection of a clustering threshold less selective than the one chosen in the SPDO creation process A cluster contains only GLSO classes: it is not exploited for the mapping generation; it means that there are descriptions of Web Services which are strongly related A cluster contains classes belonging to the SPDO and the GLSO: this cluster produces for each SPDO class a mapping to each GLSO class 12 DB Group @ unimo Example Hotel Hotel.Denomination Hotel.Location Hotel.Country SPDO fragment GLSO fragment The following mappings are generated with the application of our technology: Accommodation --> Hotel Accommodation.Name --> Hotel.Denomination Accommodation.City --> Hotel.Location Accommodation.Country --> Hotel.Country 13 DB Group @ unimo Data and eService Retrieval select <select_attribute_list> from <from_class_list> where <condition> • • The answer to this query is a data set from the data sources together with a set of services which are potentially useful, since they are related to the concepts appearing in the query and then to the retrieved data. The query processing is divided into two simultaneously executed steps: – – data set from the data sources is obtained with a query processing on an integrated view The results are obtained by exploiting the MOMIS Query Manager which rewrites the global query as an equivalent set of queries expressed on the local schemata (local queries), by means of an unfolding process a set of services related to the query is obtained by exploiting the mapping between SPDO and GLSOs and the concept of relevant service mapping. Services are retrieved by the XIRE (eXtended Information Retrieval Engine) component, which is a service search engine based on the vector space. 14 DB Group @ unimo Data and eService Retrieval (overview) 15 DB Group @ unimo Managing keywords • Given a query in an SQL-like notation expressed the SPDO terminology, the set of keywords extracted consists of: – – – • • all the classes given in the “FROM” clause, all the attributes and the values used in the “SELECT” and “WHERE” clauses all their ranges defined by ontology classes. The set of keywords are exploiting the mappings between the SPDO and the GLSO. Semantic similarity between GLSO terms defined in the SSM is exploited to expand the keyword set into a weighted terms 16 DB Group @ unimo eServices retrieval • Query evaluation is based on the vector space model: – – – by this model both documents (that is Web Service descriptions) and queries (extracted keywords) are represented as a vector in a n-dimensional space. Each vector represents a document, and it will have weights different from zero for those keywords which are indexes for that description. Relevance weights are used to modify the weights in the list resulting from keyword evaluation process. 17 DB Group @ unimo Conclusion and future work • • • • In this paper we introduced a technique for publishing and retrieving a unified view of data and services. Such unified view may be exploited for improving the user knowledge of a set of sources and for retrieving a list of web services relate to a data set. The approach is semi-automatic, and works jointly with the tools which are typically provided for searching for data and services separately. Future work will be addressed on evaluating the effectiveness of the approach in the real cases provided within the NeP4B project, and against the OWLS-TC benchmark. 18