Learning Assisted Decoupled Software Pipelining (LA-DSWP)

Fitzmorris, Lucia

dc.contributor.advisor	Barnes, Ronald
dc.contributor.author	Fitzmorris, Lucia
dc.date.accessioned	2018-05-01T16:35:50Z
dc.date.available	2018-05-01T16:35:50Z
dc.date.issued	2018-05
dc.identifier.uri	https://hdl.handle.net/11244/299691
dc.description.abstract	In this thesis, I introduce and implement an extension to the Decoupled Software Pipelining (DSWP) algorithm proposed by Rangan et al. This new extension is named Learning Assisted Decoupled Software Pipelining (or LA-DSWP) as it applies reinforcement learning to the partitioning problem found within DSWP. Through experimentation, the viability of DSWP and LA-DSWP as optimizations that produce significant program speedup is tested and measured. As computer architects strive to keep up with public expectations for processor performance growth, they are increasingly turning to processor designs which utilize multiple independent cores on a single chip. Unlike most prior hardware innovations, computer programs must be written or compiled with multiple threads in mind to take advantage of these new hardware innovations. Automatic thread-extraction using Decoupled Software Pipelining seeks to extract multiple threads from a single-threaded program~\cite{ottoni-micro-2005}. This is done by allowing loops within the program to execute on multiple cores on a single processor chip simultaneously without programmer intervention. DSWP focuses on splitting large recursive data structure's traversal loops into multiple threads in an attempt to increase overall program performance. Unlike prior implementations of DSWP, this research uses a hardware and language independent implementation of DSWP using the LLVM framework. Rather than relying on custom-built hardware to facilitate communication between program threads, this implementation uses Intel's Thread Building Blocks library to create queues in the shared memory between the various on-chip processor cores. As this thesis will show, this design setup relies heavily on the memory subsystem of the targeted processors and is greatly impacted by the actual design of the memory subsystem. Another novel addition to DSWP explored in this thesis is the application of machine learning to the partitioning process. Instead of partitioning the nodes of a loop's program dependency graph using predefined heuristics, this thesis seeks to apply reinforcement learning to allow the DSWP agent to make more informed decisions when optimizing a given loop. The DSWP agent is able to collect and analyze data about each node of a program's loop to partition the loop on a node-by-node basis. This addition constitutes LA-DSWP. Through experimentation on modern Intel processors, this thesis tests the feasibility of LA-DSWP on current hardware. Multiple kernel programs were written to search for program patterns that can achieve performance increases using DSWP partitioning. Experiments were run using the partitioning methods discussed in earlier papers along with the proposed method utilizing machine learning.	en_US
dc.language	en_US	en_US
dc.subject	Engineering, Electronics and Electrical.	en_US
dc.subject	Computer Science.	en_US
dc.title	Learning Assisted Decoupled Software Pipelining (LA-DSWP)	en_US
dc.contributor.committeeMember	Bredeson, Jon
dc.contributor.committeeMember	Havlicek, Joseph
dc.date.manuscript	2018-04
dc.thesis.degree	Master of Science	en_US
ou.group	College of Engineering::School of Electrical and Computer Engineering	en_US
shareok.nativefileaccess	restricted	en_US

Files in this item

Name:: 2018_Fitzmorris_Lucia_Thesis.pdf
Size:: 864.5Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

OU - Theses [2188]

Show simple item record

SHAREOK^TM

advancing Oklahoma scholarship, research and institutional memory

Learning Assisted Decoupled Software Pipelining (LA-DSWP)

Files in this item

This item appears in the following Collection(s)