A Uniform Query Processing Approach for Integrating Data from Heterogeneous Resources

Merja Karjalainen

A Uniform Query Processing Approach for Integrating Data from Heterogeneous Resources
Doctoral thesis, 2010

Scientists who need to explore several different databases in their research can find it difficult and tedious to extract and combine information from various heterogeneous data sources manually. This is a particular problem for researchers in the life sciences, since technical advances in the last decade have resulted in a dramatic increase in the quantity and variety of data. Many databases of interest are developed independently by different research groups, and the database administrators often want to keep their databases autonomous so that they can develop and maintain them without being constrained by other database sources. Therefore, there is a need for software solutions to the problem of data integration that facilitate combining up-to-date data from autonomous, heterogeneous databases located at different sites. A system for data integration from heterogeneous (relational and RDF/S), autonomous and distributed data sources has been designed and implemented in this work. The main aim in the design and implementation of the system has been to make large parts of query and result processing independent of the kinds of data resources that are being used. The queries are held in a resource independent form through large parts of the query processing. We refer to this as uniform query and result processing. The user states queries, global queries, against an integrated view of the underlying data resources. The integrated view does not reveal the structure of the underlying data sources. A global query is rewritten by using rules that describe the mapping from concepts in the integrated view to concepts in the data sources. This is then split into sub-queries that each relate to one of the data sources. Wrappers translate sub-queries into the query languages of the component databases, send these sub-queries to the component databases and then retrieve the results. Several small example federations have been implemented to test the system, one of which is a federation of biological databases. We have focused on incorporating data in relational databases and RDF Schema data, since these are widely used and are becoming increasingly popular for managing data collections. An outcome of this work is a functioning prototype system that applies a uniform query and result processing approach, and has a modular system design that is easy to use as a starting point for modifications and extensions.

Query processing

Functional data model

Rewrite rules

Data integration

Sal ED, Hörsalsvägen 11, Chalmers Tekniska Högskola

Opponent: Prof. Alex Gray, Cardiff University, UK

Author

Merja Karjalainen

Chalmers, Computer Science and Engineering (Chalmers), Computing Science (Chalmers)

Other publications Research

A Functional Data Model Approach to Querying RDF/RDFS Data

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),;Vol. 5071(2008)p. 153-164

Paper in proceeding

Uniform Query Processing in a Federation of RDFS and Relational Resources

International Database Engineering and Applications Symposium (IDEAS'09), Proceedings,;(2009)p. 315-320

Paper in proceeding

Subject Categories (SSIF 2011)

Bioinformatics and Systems Biology

Computer Science

ISBN

978-91-7385-472-6

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 3153

Technical report D - Department of Computer Science and Engineering, Chalmers University of Technology and Göteborg University: 74