Hindsight Experience Replay in Robotics Manipulation
In goal-conditioned reinforcement learning problems, the sample efficiency is often a drawback that most of the explorations are not consider as very useful experience because they are failure episodes, which makes low sample efficiency. In this paper, we implemented a module of Hindsight Experience Replay (HER) in several goal-conditioned environments, to discover its utility of improving sample efficiency. Based on Deep Deterministic Policy Gradient (DDPG), the experiments showed that the HER module helps the agent learn much faster with more robustness. We then discussed about the limitation of HER and how hyper parameters effects its performance.