Reinforcement learning for continuous control

dc.contributor.advisorHougen, Dean
dc.contributor.authorMurali Krishnan, Shyam Sundar
dc.contributor.committeeMemberGrant, Christan
dc.contributor.committeeMemberRadhakrishnan, Sridhar
dc.date.accessioned2020-12-21T18:46:47Z
dc.date.available2020-12-21T18:46:47Z
dc.date.issued2020-12-18
dc.date.manuscript2020
dc.description.abstractReinforcement learning is an area in machine learning that concerns how agents should take actions in an environment to maximize its reward. Earlier works on reinforcement learning algorithms work well for discrete action and discrete observations. In the continuous environment the actions are infinite in number and observations are also infinite. So certain reinforcement algorithms were developed in order to learn on continuous spaces. Among those, two algorithms are Stochastic Synapse Reinforcement Learning (SSRL) and Deep Deterministic Policy Gradient (DDPG). Previous works used these algorithms on different environments separately in order to understand the behaviour of the algorithms. In this thesis, a comparison study between between SSRL and DDPG is made by using them on the same continuous environments and observing how each algorithm behaves on each environment and what kind of strengths and weaknesses can be inferred by comparing the algorithms. The algorithms are made to run on two continuous environments, namely mountain car continuous and pendulum. They are run 10 times for each set of time steps like 2000, 3000, 4000 for 1000 episodes each and the cumulative reward at the end of each episode is found. The episode is the length of the simulation at end of which the algorithm ends in a terminal state. The average and standard deviation of cumulative rewards across 10 repetitions for each time step and all the repetitions across different time steps are also collected. The results shows different trends across different experiments. Based on the results it can be inferred that overall SSRL performs consistently even though it does not gains rewards like DDPG whereas DDPG performs inconsistently but certain rewards it earns are higher than those of SSRL. Also in the case of the delayed-reinforcement pendulum environment both algorithms do not learn well, showing their weakness towards environments whose terminal state is not definite.en_US
dc.identifier.urihttps://hdl.handle.net/11244/326671
dc.languageenen_US
dc.subjectStochastic Synapse Reinforcement Learning (SSRL)en_US
dc.subjectDeep Deterministic Policy Gradient (DDPG)en_US
dc.subjectComparison study between SSRL and DDPGen_US
dc.thesis.degreeMaster of Scienceen_US
dc.titleReinforcement learning for continuous controlen_US
ou.groupGallogly College of Engineering::School of Computer Scienceen_US
shareok.nativefileaccessrestricteden_US
shareok.orcid0000-0001-9239-4390en_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2020_Murali Krishnan_Shyam Sundar_Thesis.pdf
Size:
4.3 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections