Distributed spatial query processing and optimization
Loading...
Date
2013
Authors
Farruque, Nawshad
University of Lethbridge. Faculty of Arts and Science
Journal Title
Journal ISSN
Volume Title
Publisher
Lethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Science, 2013
Abstract
Applications exist today that require the management of distributed spatial data. Since
spatial data is more complex than non-spatial data, performing queries on it requires more
local processing (i.e. CPU and I/O) time. Also, due to geographical distribution, data
transmission costs must be considered. To reduce these costs, one can employ a distributed
spatial semijoin as it eliminates unnecessary objects before their transmission to other sites
and the query site.
Most existing work propose different representations of the distributed spatial semijoin
between two sites only, with very few works exploring its use for processing a query
involving more than two sites. In this thesis, we propose both new approaches for representing
the spatial semijoin in a distributed setting, and their use for processing a distributed
query consisting of any number of sites. Two strategies are proposed for compactly representing
the spatial semijoin that reduce both the data transmission and local processing
(CPU+I/O) costs when applied in a distributed spatial query. A Global Encompassing Minimum
Bounding Rectangle (GEMBR) is utilized, which is partitioned, mapped and applied
in two different ways to approximate the objects in a spatial joining attribute. The first is
partition indices, while the second is a bit array representation. Then each spatial semijoin
is applied in a multi-site distributed spatial query processing strategy. In addition, the
two-site spatial semijoin is extended to handle multiple sites so that we have a benchmark
strategy for comparison purposes.
We have tested the query processing algorithms for four sites, which are a part of an
actual working distributed system. The algorithms are compared with respect to data transmission
cost, CPU time, I/O time and false positive results. The algorithms are superior in
many cases at optimizing the above criteria. The bit array representation, which is called Bloom Filter Based Spatial Semijoin (BFSJ), is evaluated with respect to different filter factors and found that the optimized algorithms perform significantly better than the Distributed
Na¨ıve Spatial Semijoin strategy when synthetic data was used. Also the Partition
and Mapping Based Spatial Semijoin (PMSJ) is 1.38 times faster than BFSJ with respect
to processing cost while the BFSJ has a tranmission cost gain of 1.12 over PMSJ. Both
algorithms are 18 times faster and have six times less transmission cost than Distributed
Na¨ıve Spatial Semijoin (NSPJ). Finally, it is also observed that with the increase of hash
functions and filter factor the false positive percentage increases.
Description
x, 76 leaves ; 29 cm
Keywords
Distributed spatial data , Spatial query processing , GEMBR partitioning and mapping , Dissertations, Academic