Reinforcement learning for continuous control

Murali Krishnan, Shyam Sundar

Reinforcement learning for continuous control

dc.contributor.advisor	Hougen, Dean
dc.contributor.author	Murali Krishnan, Shyam Sundar
dc.contributor.committeeMember	Grant, Christan
dc.contributor.committeeMember	Radhakrishnan, Sridhar
dc.date.accessioned	2020-12-21T18:46:47Z
dc.date.available	2020-12-21T18:46:47Z
dc.date.issued	2020-12-18
dc.date.manuscript	2020
dc.description.abstract	Reinforcement learning is an area in machine learning that concerns how agents should take actions in an environment to maximize its reward. Earlier works on reinforcement learning algorithms work well for discrete action and discrete observations. In the continuous environment the actions are infinite in number and observations are also infinite. So certain reinforcement algorithms were developed in order to learn on continuous spaces. Among those, two algorithms are Stochastic Synapse Reinforcement Learning (SSRL) and Deep Deterministic Policy Gradient (DDPG). Previous works used these algorithms on different environments separately in order to understand the behaviour of the algorithms. In this thesis, a comparison study between between SSRL and DDPG is made by using them on the same continuous environments and observing how each algorithm behaves on each environment and what kind of strengths and weaknesses can be inferred by comparing the algorithms. The algorithms are made to run on two continuous environments, namely mountain car continuous and pendulum. They are run 10 times for each set of time steps like 2000, 3000, 4000 for 1000 episodes each and the cumulative reward at the end of each episode is found. The episode is the length of the simulation at end of which the algorithm ends in a terminal state. The average and standard deviation of cumulative rewards across 10 repetitions for each time step and all the repetitions across different time steps are also collected. The results shows different trends across different experiments. Based on the results it can be inferred that overall SSRL performs consistently even though it does not gains rewards like DDPG whereas DDPG performs inconsistently but certain rewards it earns are higher than those of SSRL. Also in the case of the delayed-reinforcement pendulum environment both algorithms do not learn well, showing their weakness towards environments whose terminal state is not definite.	en_US
dc.identifier.uri	https://hdl.handle.net/11244/326671
dc.language	en	en_US
dc.subject	Stochastic Synapse Reinforcement Learning (SSRL)	en_US
dc.subject	Deep Deterministic Policy Gradient (DDPG)	en_US
dc.subject	Comparison study between SSRL and DDPG	en_US
dc.thesis.degree	Master of Science	en_US
dc.title	Reinforcement learning for continuous control	en_US
ou.group	Gallogly College of Engineering::School of Computer Science	en_US
shareok.nativefileaccess	restricted	en_US
shareok.orcid	0000-0001-9239-4390	en_US