Learning Assisted Decoupled Software Pipelining (LA-DSWP)

dc.contributor.advisorBarnes, Ronald
dc.contributor.authorFitzmorris, Lucia
dc.contributor.committeeMemberBredeson, Jon
dc.contributor.committeeMemberHavlicek, Joseph
dc.date.accessioned2018-05-01T16:35:50Z
dc.date.available2018-05-01T16:35:50Z
dc.date.issued2018-05
dc.date.manuscript2018-04
dc.description.abstractIn this thesis, I introduce and implement an extension to the Decoupled Software Pipelining (DSWP) algorithm proposed by Rangan et al. This new extension is named Learning Assisted Decoupled Software Pipelining (or LA-DSWP) as it applies reinforcement learning to the partitioning problem found within DSWP. Through experimentation, the viability of DSWP and LA-DSWP as optimizations that produce significant program speedup is tested and measured. As computer architects strive to keep up with public expectations for processor performance growth, they are increasingly turning to processor designs which utilize multiple independent cores on a single chip. Unlike most prior hardware innovations, computer programs must be written or compiled with multiple threads in mind to take advantage of these new hardware innovations. Automatic thread-extraction using Decoupled Software Pipelining seeks to extract multiple threads from a single-threaded program~\cite{ottoni-micro-2005}. This is done by allowing loops within the program to execute on multiple cores on a single processor chip simultaneously without programmer intervention. DSWP focuses on splitting large recursive data structure's traversal loops into multiple threads in an attempt to increase overall program performance. Unlike prior implementations of DSWP, this research uses a hardware and language independent implementation of DSWP using the LLVM framework. Rather than relying on custom-built hardware to facilitate communication between program threads, this implementation uses Intel's Thread Building Blocks library to create queues in the shared memory between the various on-chip processor cores. As this thesis will show, this design setup relies heavily on the memory subsystem of the targeted processors and is greatly impacted by the actual design of the memory subsystem. Another novel addition to DSWP explored in this thesis is the application of machine learning to the partitioning process. Instead of partitioning the nodes of a loop's program dependency graph using predefined heuristics, this thesis seeks to apply reinforcement learning to allow the DSWP agent to make more informed decisions when optimizing a given loop. The DSWP agent is able to collect and analyze data about each node of a program's loop to partition the loop on a node-by-node basis. This addition constitutes LA-DSWP. Through experimentation on modern Intel processors, this thesis tests the feasibility of LA-DSWP on current hardware. Multiple kernel programs were written to search for program patterns that can achieve performance increases using DSWP partitioning. Experiments were run using the partitioning methods discussed in earlier papers along with the proposed method utilizing machine learning.en_US
dc.identifier.urihttps://hdl.handle.net/11244/299691
dc.languageen_USen_US
dc.subjectEngineering, Electronics and Electrical.en_US
dc.subjectComputer Science.en_US
dc.thesis.degreeMaster of Scienceen_US
dc.titleLearning Assisted Decoupled Software Pipelining (LA-DSWP)en_US
ou.groupCollege of Engineering::School of Electrical and Computer Engineeringen_US
shareok.nativefileaccessrestricteden_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2018_Fitzmorris_Lucia_Thesis.pdf
Size:
864.5 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections