DB Group @ unimo
1st International Workshop on Interoperability
through Semantic Data and Service Integration
25 June 2009
Camogli, Italy
Searching for data and services
F. Guerra1, A. Maurino2 , M. Palmonari2, G. Pasi2 , A. Sala3
1DEA
- Università di Modena e Reggio Emilia, v.le Sarca 336, Milano, Italy
2DISCO - Università di Milano Bicocca, v.le Risorgimento 2, Bologna, Italy
3DII - Università di Modena e Reggio Emilia, via Vignolese 905, Modena, Italy
1
DB Group @ unimo
Outline
1.
2.
3.
4.
Motivation
Building the Global Data and Service View at Set-up Time
Data and eService Retrieval
Conclusion and future work
2
DB Group @ unimo
Motivation
•
The research on data integration and service discovering has involved from the
beginning different (not always overlapping) communities.
–
•
•
•
Data and services are described with different models, and different techniques to retrieve
data and services have been developed.
From a user perspective, the border between data and services is often not so
definite, since data and services provide a complementary vision about the available
resources.
Users need new techniques to manage data and services in a unified way.
Integration of data and services can be tackled from different perspectives.
–
–
Access to data is guaranteed though Service Oriented Architectures (SOA), and Web
services are exploited to provide information integration platforms;
Providing a global view on the data sources and on eServices available in the peer to
support the access to the two complementary kinds of resources at a same time.
3
DB Group @ unimo
Motivation (2)
Select Name, Country
from Accommodation
Where City=’Modena’
The problem we address in is to retrieve, among the many services available, the
ones that are related to the query, according to the semantics of the terms
involved in the query.
4
DB Group @ unimo
The approach (overview)
•
•
We assume to have a mediator-based data integration system which provides a
global virtual view of data - the Semantic Peer Data Ontology (SPDO).
We assume to have a set of semantically annotated service descriptions.
–
•
We propose a semantic-based approach to perform data and service integration:
–
•
Ontologies used in the service descriptions can be developed outside the peer and are not
known in advance, in the integration process.
given a SQL- like query expressed in the terminology of the SPDO, retrieve all the services
that can be considered “related” to the query on the data sources.
The approach developed is based on:
–
–
a mediator-based data integration system, the MOMIS system (Mediator envirOnment for
Multiple Information Sources);
a service retrieval engine based on IR techniques performing semantic indexing of service
descriptions and keyword-based semantic search.
5
DB Group @ unimo
The approach (overview)
•
The integration of data and services is achieved by:
1. building the SPDO (a functionality already provided by MOMIS),
2. building a Global Service Ontology (GSO) consisting of the ontologies used in the service
semantic descriptions,
3. defining a set of mappings between the SPDO and the GSO,
4. exploiting, at query time, query rewriting techniques based on these mappings to build a
keyword-based query for service retrieval expressed in the GSO terminology starting from a
SQL-like query on the data sources.
6
DB Group @ unimo
Building the Global Data and Service View
The SPDO is built by exploiting the
MOMIS integration system
The global light service ontology is built
by means of the following steps:



Service indexing,
Global Service Ontology (GSO)
construction,
Global Light Service Ontology (GLSO)
construction and Semantic Similarity
Matrix (SSM) definition.
7
DB Group @ unimo
MOMIS
8
DB Group @ unimo
Service Indexing
•
Our approach requires a formal representation of the service descriptions and it is
based on full text indexing which extracts terms from six specific sections of the
service description:
–
–
–
–
–
–
•
A set of index terms I that will be part of the dictionary is extracted.
–
–
•
service name,
Service description,
input,
output,
pre-condition
post-condition
IO= the set of index terms consisting of ontology
IT = the set index terms extracted from textual descriptions
The indexing structure is based on a “structured document” approach, where
inverted file structure consists of:
–
–
a dictionary file based on I,
a posting file, with a list of references to the services’ sections where the considered term
occurs
9
DB Group @ unimo
GSO construction
•
The GSO is built by:
–
–
•
loosely merging each service ontology O such that i belongs to O for some i in IO
associating a concept Ci with each i in IT, introducing a class Terms subclass of Thing in the
GSO and stating that for every i in IT, Ci is subclass of Terms
“loosely merging” means that SOs are merged without attempting to integrate
similar concepts across the different integrated ontologies.
–
–
–
if the source SOs are consistent, the GSO can be assumed to be consistent
Loose merging is clearly not the optimal choice with respect to ontology integration
Since the XIRE component is based on approximate IR techniques and semantic similarity,
approximate solutions to the ontology integration problem can be considered acceptable;
instead, the whole GSO building process need to be fully automatized.
10
DB Group @ unimo
GLSO construction and Semantic Similarity Matrix
•
The GSO may result extremely large in size: only a subset of the terms of the
ontologies are relevant to the SWS descriptions.
–
–
•
a technique to reduce the ontology size is exploited and a GLSO (Global Light Service
Ontology) is obtained.
We extract from the GSO, the subontology that preserves the meanings of the terms
explicitly used in the service descriptions, namely, the set of the index terms I.
The Semantic Similarity Matrix (SSM), which is exploited later on for query expansion
at query time, is computed.
–
The SSM is defined by analyzing the GLSO structure, according to some semantic measure
developed in literature and takes into account subclass paths, domain and range restrictions
on properties, membership of instances, and so on.
11
DB Group @ unimo
Mapping of Data and Service Ontologies
•
•
•
Mappings between the elements of the SPDO and the GLSO are generated by
exploiting and properly modifying the MOMIS clustering algorithm.
The clustering algorithm takes as input the SPDO and the GLSO with their associated
metadata and generates a set of clusters of classes belonging to the SPDO and the
GLSO.
Mappings are automatically generated exploiting the clustering result.
–
–
–
A cluster contains only SPDO classes: it is not exploited for the mapping generation; this
cluster is caused by the selection of a clustering threshold less selective than the one
chosen in the SPDO creation process
A cluster contains only GLSO classes: it is not exploited for the mapping generation; it
means that there are descriptions of Web Services which are strongly related
A cluster contains classes belonging to the SPDO and the GLSO: this cluster produces for
each SPDO class a mapping to each GLSO class
12
DB Group @ unimo
Example
Hotel
Hotel.Denomination
Hotel.Location
Hotel.Country
SPDO fragment
GLSO fragment
The following mappings are generated with the application of our technology:
Accommodation --> Hotel
Accommodation.Name --> Hotel.Denomination
Accommodation.City --> Hotel.Location
Accommodation.Country --> Hotel.Country
13
DB Group @ unimo
Data and eService Retrieval
select <select_attribute_list>
from <from_class_list>
where <condition>
•
•
The answer to this query is a data set from the data sources together with a set of
services which are potentially useful, since they are related to the concepts
appearing in the query and then to the retrieved data.
The query processing is divided into two simultaneously executed steps:
–
–
data set from the data sources is obtained with a query processing on an integrated view
The results are obtained by exploiting the MOMIS Query Manager which rewrites the global
query as an equivalent set of queries expressed on the local schemata (local queries), by
means of an unfolding process
a set of services related to the query is obtained by exploiting the mapping between SPDO
and GLSOs and the concept of relevant service mapping.
Services are retrieved by the XIRE (eXtended Information Retrieval Engine) component,
which is a service search engine based on the vector space.
14
DB Group @ unimo
Data and eService Retrieval (overview)
15
DB Group @ unimo
Managing keywords
•
Given a query in an SQL-like notation expressed the SPDO terminology, the set of
keywords extracted consists of:
–
–
–
•
•
all the classes given in the “FROM” clause,
all the attributes and the values used in the “SELECT” and “WHERE” clauses
all their ranges defined by ontology classes.
The set of keywords are exploiting the mappings between the SPDO and the GLSO.
Semantic similarity between GLSO terms defined in the SSM is exploited to expand
the keyword set into a weighted terms
16
DB Group @ unimo
eServices retrieval
•
Query evaluation is based on the vector space model:
–
–
–
by this model both documents (that is Web Service descriptions) and queries (extracted
keywords) are represented as a vector in a n-dimensional space.
Each vector represents a document, and it will have weights different from zero for those
keywords which are indexes for that description.
Relevance weights are used to modify the weights in the list resulting from keyword
evaluation process.
17
DB Group @ unimo
Conclusion and future work
•
•
•
•
In this paper we introduced a technique for publishing and retrieving a unified view
of data and services.
Such unified view may be exploited for improving the user knowledge of a set of
sources and for retrieving a list of web services relate to a data set.
The approach is semi-automatic, and works jointly with the tools which are typically
provided for searching for data and services separately.
Future work will be addressed on evaluating the effectiveness of the approach in the
real cases provided within the NeP4B project, and against the OWLS-TC benchmark.
18
Scarica

Searching for data and services