Distributed spatial query processing and optimization

Thumbnail Image
Farruque, Nawshad
University of Lethbridge. Faculty of Arts and Science
Journal Title
Journal ISSN
Volume Title
Lethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Science, 2013
Applications exist today that require the management of distributed spatial data. Since spatial data is more complex than non-spatial data, performing queries on it requires more local processing (i.e. CPU and I/O) time. Also, due to geographical distribution, data transmission costs must be considered. To reduce these costs, one can employ a distributed spatial semijoin as it eliminates unnecessary objects before their transmission to other sites and the query site. Most existing work propose different representations of the distributed spatial semijoin between two sites only, with very few works exploring its use for processing a query involving more than two sites. In this thesis, we propose both new approaches for representing the spatial semijoin in a distributed setting, and their use for processing a distributed query consisting of any number of sites. Two strategies are proposed for compactly representing the spatial semijoin that reduce both the data transmission and local processing (CPU+I/O) costs when applied in a distributed spatial query. A Global Encompassing Minimum Bounding Rectangle (GEMBR) is utilized, which is partitioned, mapped and applied in two different ways to approximate the objects in a spatial joining attribute. The first is partition indices, while the second is a bit array representation. Then each spatial semijoin is applied in a multi-site distributed spatial query processing strategy. In addition, the two-site spatial semijoin is extended to handle multiple sites so that we have a benchmark strategy for comparison purposes. We have tested the query processing algorithms for four sites, which are a part of an actual working distributed system. The algorithms are compared with respect to data transmission cost, CPU time, I/O time and false positive results. The algorithms are superior in many cases at optimizing the above criteria. The bit array representation, which is called Bloom Filter Based Spatial Semijoin (BFSJ), is evaluated with respect to different filter factors and found that the optimized algorithms perform significantly better than the Distributed Na¨ıve Spatial Semijoin strategy when synthetic data was used. Also the Partition and Mapping Based Spatial Semijoin (PMSJ) is 1.38 times faster than BFSJ with respect to processing cost while the BFSJ has a tranmission cost gain of 1.12 over PMSJ. Both algorithms are 18 times faster and have six times less transmission cost than Distributed Na¨ıve Spatial Semijoin (NSPJ). Finally, it is also observed that with the increase of hash functions and filter factor the false positive percentage increases.
x, 76 leaves ; 29 cm
Distributed spatial data , Spatial query processing , GEMBR partitioning and mapping , Dissertations, Academic