Reinforcement learning for continuous control
dc.contributor.advisor | Hougen, Dean | |
dc.contributor.author | Murali Krishnan, Shyam Sundar | |
dc.contributor.committeeMember | Grant, Christan | |
dc.contributor.committeeMember | Radhakrishnan, Sridhar | |
dc.date.accessioned | 2020-12-21T18:46:47Z | |
dc.date.available | 2020-12-21T18:46:47Z | |
dc.date.issued | 2020-12-18 | |
dc.date.manuscript | 2020 | |
dc.description.abstract | Reinforcement learning is an area in machine learning that concerns how agents should take actions in an environment to maximize its reward. Earlier works on reinforcement learning algorithms work well for discrete action and discrete observations. In the continuous environment the actions are infinite in number and observations are also infinite. So certain reinforcement algorithms were developed in order to learn on continuous spaces. Among those, two algorithms are Stochastic Synapse Reinforcement Learning (SSRL) and Deep Deterministic Policy Gradient (DDPG). Previous works used these algorithms on different environments separately in order to understand the behaviour of the algorithms. In this thesis, a comparison study between between SSRL and DDPG is made by using them on the same continuous environments and observing how each algorithm behaves on each environment and what kind of strengths and weaknesses can be inferred by comparing the algorithms. The algorithms are made to run on two continuous environments, namely mountain car continuous and pendulum. They are run 10 times for each set of time steps like 2000, 3000, 4000 for 1000 episodes each and the cumulative reward at the end of each episode is found. The episode is the length of the simulation at end of which the algorithm ends in a terminal state. The average and standard deviation of cumulative rewards across 10 repetitions for each time step and all the repetitions across different time steps are also collected. The results shows different trends across different experiments. Based on the results it can be inferred that overall SSRL performs consistently even though it does not gains rewards like DDPG whereas DDPG performs inconsistently but certain rewards it earns are higher than those of SSRL. Also in the case of the delayed-reinforcement pendulum environment both algorithms do not learn well, showing their weakness towards environments whose terminal state is not definite. | en_US |
dc.identifier.uri | https://hdl.handle.net/11244/326671 | |
dc.language | en | en_US |
dc.subject | Stochastic Synapse Reinforcement Learning (SSRL) | en_US |
dc.subject | Deep Deterministic Policy Gradient (DDPG) | en_US |
dc.subject | Comparison study between SSRL and DDPG | en_US |
dc.thesis.degree | Master of Science | en_US |
dc.title | Reinforcement learning for continuous control | en_US |
ou.group | Gallogly College of Engineering::School of Computer Science | en_US |
shareok.nativefileaccess | restricted | en_US |
shareok.orcid | 0000-0001-9239-4390 | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- 2020_Murali Krishnan_Shyam Sundar_Thesis.pdf
- Size:
- 4.3 MB
- Format:
- Adobe Portable Document Format
- Description:
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: