Gruenwald, LeLeal Gonzalez, Eleazar2017-05-112017-05-112017-05-12http://hdl.handle.net/11244/50767Through the use of location-sensing devices, it has been possible to collect very large datasets of trajectories. These datasets make it possible to issue spatio-temporal queries with which users can gather information about the characteristics of the movements of objects, derive patterns from that information, and understand the objects themselves. Among such spatio-temporal queries that can be issued is the top-K trajectory similarity query. This query finds many applications, such as bird migration analysis in ecology and trajectory sharing in social networks. However, the large volumes of the trajectory query sets and databases, along with their associated uncertainty, pose significant computational challenges. One way to address these challenges is through the use of parallel architectures like GPUs, and through the use of models that can produce accurate trajectory estimates. Nevertheless, not much research has been done to design efficient and scalable techniques to process this type of query on parallel architectures. In this dissertation, we propose a novel system to process top-K trajectory similarity queries in parallel on Big Data using GPUs that is capable of handling both certain and uncertain trajectory data. The system consists of four novel algorithms: TKSimGPU to process top-K trajectory similarity queries; Top-KaBT to reduce the size of the candidate set generated by top-K trajectory similarity query algorithms; TrajEstU to estimate the true trajectory when data uncertainty exists; and TraclusGPU to perform local trajectory clustering to aid in the preprocessing stage of TrajEstU. TKSimGPU works by iteratively processing near-join similarity queries, while Top-KaBT calculates the lower and upper bounds of the Hausdorff distance between candidate pairs, and then uses these bounds to remove spurious candidates. Top-KaBT exploits GPUs to improve TKSimGPU by ensuring load balancing across the threads, ensuring memory coalescing, and using special pruning techniques that reduce the size of the candidate set. TrajEstU splits the lifetime of an object’s trajectory into time intervals where the object’s acceleration is nearly constant. Then TrajEstU uses the local trajectory clusters to obtain the movement patterns that are prevalent in the areas where trajectories have low-sampling rates, and uses linear regression to fit a constant acceleration model to the observed positions of the moving object. Finally, TraclusGPU helps TrajEstU scalably find those local trajectory clusters that are used in the construction of trajectory models. Extensive theoretical and experimental evaluations performed on our proposed techniques showed that each of them has better performance in terms of accuracy and execution time than state-of-the-art techniques when applied to large real-life and synthetic trajectory datasets for Big Data applications.Trajectory DatabasesTrajectory Similarity QueriesSpatio-Temporal DatabasesGPU DatabasesParallel Processing of Top-K Trajectory Similarity Queries on Big Data Using GPUs