Show simple item record

dc.contributor.advisorHougen, Dean
dc.contributor.authorMurali Krishnan, Shyam Sundar
dc.date.accessioned2020-12-21T18:46:47Z
dc.date.available2020-12-21T18:46:47Z
dc.date.issued2020-12-18
dc.identifier.urihttps://hdl.handle.net/11244/326671
dc.description.abstractReinforcement learning is an area in machine learning that concerns how agents should take actions in an environment to maximize its reward. Earlier works on reinforcement learning algorithms work well for discrete action and discrete observations. In the continuous environment the actions are infinite in number and observations are also infinite. So certain reinforcement algorithms were developed in order to learn on continuous spaces. Among those, two algorithms are Stochastic Synapse Reinforcement Learning (SSRL) and Deep Deterministic Policy Gradient (DDPG). Previous works used these algorithms on different environments separately in order to understand the behaviour of the algorithms. In this thesis, a comparison study between between SSRL and DDPG is made by using them on the same continuous environments and observing how each algorithm behaves on each environment and what kind of strengths and weaknesses can be inferred by comparing the algorithms. The algorithms are made to run on two continuous environments, namely mountain car continuous and pendulum. They are run 10 times for each set of time steps like 2000, 3000, 4000 for 1000 episodes each and the cumulative reward at the end of each episode is found. The episode is the length of the simulation at end of which the algorithm ends in a terminal state. The average and standard deviation of cumulative rewards across 10 repetitions for each time step and all the repetitions across different time steps are also collected. The results shows different trends across different experiments. Based on the results it can be inferred that overall SSRL performs consistently even though it does not gains rewards like DDPG whereas DDPG performs inconsistently but certain rewards it earns are higher than those of SSRL. Also in the case of the delayed-reinforcement pendulum environment both algorithms do not learn well, showing their weakness towards environments whose terminal state is not definite.en_US
dc.languageenen_US
dc.subjectStochastic Synapse Reinforcement Learning (SSRL)en_US
dc.subjectDeep Deterministic Policy Gradient (DDPG)en_US
dc.subjectComparison study between SSRL and DDPGen_US
dc.titleReinforcement learning for continuous controlen_US
dc.contributor.committeeMemberGrant, Christan
dc.contributor.committeeMemberRadhakrishnan, Sridhar
dc.date.manuscript2020
dc.thesis.degreeMaster of Scienceen_US
ou.groupGallogly College of Engineering::School of Computer Scienceen_US
shareok.orcid0000-0001-9239-4390en_US
shareok.nativefileaccessrestricteden_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record