Processing dependent tasks on a heterogeneous GPU resource architecture

Yang, Rui

dc.contributor.advisor	Thomas, Johnson
dc.contributor.author	Yang, Rui
dc.date.accessioned	2014-09-24T14:17:11Z
dc.date.available	2014-09-24T14:17:11Z
dc.date.issued	2013-07
dc.identifier.uri	https://hdl.handle.net/11244/11055
dc.description.abstract	In this dissertation, a heterogeneous GPUs system means the system consists of a variety of different types of GPUs. Many problems in science and engineering can be represented as a two dimensional grid where updating of each grid point value is dependent on its nearest neighbor's values. The grid size used may be too large to be handled on a single computing node. If a distributed and heterogeneous processors system is applied two crucial issues are introduced, namely, minimizing inter-processors communication and load balancing. Firstly, a novel partitioning algorithm for heterogeneous processors (NPHP) is proposed which is based on gird shape to choose an efficient way to divide blocks as square as possible to minimize communication cost. Secondly, a functional performance model with communication (FPMC) is proposed to estimate the absolute speeds of processors accurately. This method can accurately divide the workload proportional to the speeds of GPUs. Based on these two partitioning algorithms, a heterogeneous GPU system (HG) is implemented. The HG is different from other distributed GPU systems because HG can process dependent tasks which indicate the tasks in HG can communicate with each other. Furthermore, a dynamic component is designed and implement in HG system. Hence the neighbor relationship can change at run time. Using this architecture HG can deal with more complex task dependent applications. To validate our approach, a HG system running heat transfer and Gaussian Elimination is implemented. The results of experiment demonstrate that the heterogeneous GPU system has an essential advantage over traditional homogeneous GPU and CPUs system. For the static neighbor application, heat transfer, HG is at least 8 times faster than a MPI program running on CPU. For the dynamic neighbor application, Gaussian Elimination, HG can get 2.75 times speedup. Also we propose and implement some optimizations to improve performance. These include NPHP which reduces communication cost by at least 10%, and FMPC which improves the load balance by 10% on average. Optimization in the form of the data reuse technology in the computing kernel to utilize shared memory to reduce the global memory accesses yields a 7 times speedup.
dc.format	application/pdf
dc.language	en_US
dc.rights	Copyright is held by the author who has granted the Oklahoma State University Library the non-exclusive right to share this material in its institutional repository. Contact Digital Library Services at lib-dls@okstate.edu or 405-744-9161 for the permission policy on the use, reproduction or distribution of this material.
dc.title	Processing dependent tasks on a heterogeneous GPU resource architecture
dc.contributor.committeeMember	Kak, Subhash
dc.contributor.committeeMember	Mayfield, Blayne
dc.contributor.committeeMember	Fan, Guoliang
osu.filename	Yang_okstate_0664D_12848.pdf
osu.accesstype	Open Access
dc.type.genre	Dissertation
dc.type.material	Text
thesis.degree.discipline	Computer Science
thesis.degree.grantor	Oklahoma State University

Files in this item

Name:: Yang_okstate_0664D_12848.pdf
Size:: 1.794Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

OSU Dissertations [11222]

Show simple item record

SHAREOK^TM

advancing Oklahoma scholarship, research and institutional memory

Processing dependent tasks on a heterogeneous GPU resource architecture

Files in this item

This item appears in the following Collection(s)