Show simple item record

dc.contributor.advisorThomas, Johnson
dc.contributor.authorYang, Rui
dc.date.accessioned2014-09-24T14:17:11Z
dc.date.available2014-09-24T14:17:11Z
dc.date.issued2013-07
dc.identifier.urihttps://hdl.handle.net/11244/11055
dc.description.abstractIn this dissertation, a heterogeneous GPUs system means the system consists of a variety of different types of GPUs. Many problems in science and engineering can be represented as a two dimensional grid where updating of each grid point value is dependent on its nearest neighbor's values. The grid size used may be too large to be handled on a single computing node. If a distributed and heterogeneous processors system is applied two crucial issues are introduced, namely, minimizing inter-processors communication and load balancing. Firstly, a novel partitioning algorithm for heterogeneous processors (NPHP) is proposed which is based on gird shape to choose an efficient way to divide blocks as square as possible to minimize communication cost. Secondly, a functional performance model with communication (FPMC) is proposed to estimate the absolute speeds of processors accurately. This method can accurately divide the workload proportional to the speeds of GPUs. Based on these two partitioning algorithms, a heterogeneous GPU system (HG) is implemented. The HG is different from other distributed GPU systems because HG can process dependent tasks which indicate the tasks in HG can communicate with each other. Furthermore, a dynamic component is designed and implement in HG system. Hence the neighbor relationship can change at run time. Using this architecture HG can deal with more complex task dependent applications. To validate our approach, a HG system running heat transfer and Gaussian Elimination is implemented. The results of experiment demonstrate that the heterogeneous GPU system has an essential advantage over traditional homogeneous GPU and CPUs system. For the static neighbor application, heat transfer, HG is at least 8 times faster than a MPI program running on CPU. For the dynamic neighbor application, Gaussian Elimination, HG can get 2.75 times speedup. Also we propose and implement some optimizations to improve performance. These include NPHP which reduces communication cost by at least 10%, and FMPC which improves the load balance by 10% on average. Optimization in the form of the data reuse technology in the computing kernel to utilize shared memory to reduce the global memory accesses yields a 7 times speedup.
dc.formatapplication/pdf
dc.languageen_US
dc.rightsCopyright is held by the author who has granted the Oklahoma State University Library the non-exclusive right to share this material in its institutional repository. Contact Digital Library Services at lib-dls@okstate.edu or 405-744-9161 for the permission policy on the use, reproduction or distribution of this material.
dc.titleProcessing dependent tasks on a heterogeneous GPU resource architecture
dc.contributor.committeeMemberKak, Subhash
dc.contributor.committeeMemberMayfield, Blayne
dc.contributor.committeeMemberFan, Guoliang
osu.filenameYang_okstate_0664D_12848.pdf
osu.accesstypeOpen Access
dc.type.genreDissertation
dc.type.materialText
thesis.degree.disciplineComputer Science
thesis.degree.grantorOklahoma State University


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record