Reinforcement learning for continuous control

Murali Krishnan, Shyam Sundar

Date

2020-12-18

Reinforcement learning is an area in machine learning that concerns how agents should take actions in an environment to maximize its reward. Earlier works on reinforcement learning algorithms work well for discrete action and discrete observations. In the continuous environment the actions are infinite in number and observations are also infinite. So certain reinforcement algorithms were developed in order to learn on continuous spaces. Among those, two algorithms are Stochastic Synapse Reinforcement Learning (SSRL) and Deep Deterministic Policy Gradient (DDPG). Previous works used these algorithms on different environments separately in order to understand the behaviour of the algorithms. In this thesis, a comparison study between between SSRL and DDPG is made by using them on the same continuous environments and observing how each algorithm behaves on each environment and what kind of strengths and weaknesses can be inferred by comparing the algorithms. The algorithms are made to run on two continuous environments, namely mountain car continuous and pendulum. They are run 10 times for each set of time steps like 2000, 3000, 4000 for 1000 episodes each and the cumulative reward at the end of each episode is found. The episode is the length of the simulation at end of which the algorithm ends in a terminal state. The average and standard deviation of cumulative rewards across 10 repetitions for each time step and all the repetitions across different time steps are also collected. The results shows different trends across different experiments. Based on the results it can be inferred that overall SSRL performs consistently even though it does not gains rewards like DDPG whereas DDPG performs inconsistently but certain rewards it earns are higher than those of SSRL. Also in the case of the delayed-reinforcement pendulum environment both algorithms do not learn well, showing their weakness towards environments whose terminal state is not definite.

Keywords

Stochastic Synapse Reinforcement Learning (SSRL), Deep Deterministic Policy Gradient (DDPG), Comparison study between SSRL and DDPG

URI

https://hdl.handle.net/11244/326671

Collections

OU - Theses

Full item page

SHAREOK^TM

advancing Oklahoma scholarship, research and institutional memory

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Description

Keywords

Citation

URI

DOI

Related file

Notes

Sponsorship

Collections