Show simple item record

dc.contributor.supervisor Osborn, Wendy
dc.contributor.author Farruque, Nawshad
dc.contributor.author University of Lethbridge. Faculty of Arts and Science
dc.date.accessioned 2014-10-28T01:39:44Z
dc.date.available 2014-10-28T01:39:44Z
dc.date.issued 2013
dc.identifier.uri https://hdl.handle.net/10133/3563
dc.description x, 76 leaves ; 29 cm en_US
dc.description.abstract Applications exist today that require the management of distributed spatial data. Since spatial data is more complex than non-spatial data, performing queries on it requires more local processing (i.e. CPU and I/O) time. Also, due to geographical distribution, data transmission costs must be considered. To reduce these costs, one can employ a distributed spatial semijoin as it eliminates unnecessary objects before their transmission to other sites and the query site. Most existing work propose different representations of the distributed spatial semijoin between two sites only, with very few works exploring its use for processing a query involving more than two sites. In this thesis, we propose both new approaches for representing the spatial semijoin in a distributed setting, and their use for processing a distributed query consisting of any number of sites. Two strategies are proposed for compactly representing the spatial semijoin that reduce both the data transmission and local processing (CPU+I/O) costs when applied in a distributed spatial query. A Global Encompassing Minimum Bounding Rectangle (GEMBR) is utilized, which is partitioned, mapped and applied in two different ways to approximate the objects in a spatial joining attribute. The first is partition indices, while the second is a bit array representation. Then each spatial semijoin is applied in a multi-site distributed spatial query processing strategy. In addition, the two-site spatial semijoin is extended to handle multiple sites so that we have a benchmark strategy for comparison purposes. We have tested the query processing algorithms for four sites, which are a part of an actual working distributed system. The algorithms are compared with respect to data transmission cost, CPU time, I/O time and false positive results. The algorithms are superior in many cases at optimizing the above criteria. The bit array representation, which is called Bloom Filter Based Spatial Semijoin (BFSJ), is evaluated with respect to different filter factors and found that the optimized algorithms perform significantly better than the Distributed Na¨ıve Spatial Semijoin strategy when synthetic data was used. Also the Partition and Mapping Based Spatial Semijoin (PMSJ) is 1.38 times faster than BFSJ with respect to processing cost while the BFSJ has a tranmission cost gain of 1.12 over PMSJ. Both algorithms are 18 times faster and have six times less transmission cost than Distributed Na¨ıve Spatial Semijoin (NSPJ). Finally, it is also observed that with the increase of hash functions and filter factor the false positive percentage increases. en_US
dc.language.iso en_CA en_US
dc.publisher Lethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Science, 2013 en_US
dc.relation.ispartofseries Thesis (University of Lethbridge. Faculty of Arts and Science) en_US
dc.subject Distributed spatial data en_US
dc.subject Spatial query processing en_US
dc.subject GEMBR partitioning and mapping en_US
dc.subject Dissertations, Academic
dc.title Distributed spatial query processing and optimization en_US
dc.type Thesis en_US
dc.publisher.faculty Arts and Science en_US
dc.publisher.department Department of Mathematics and Computer Science en_US
dc.degree.level Masters en_US
dc.degree.level Masters
dc.proquestyes No en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record