arXiv:1812.00045 Abstract | arXiv Analytics

arXiv:1812.00045 [cs.LG]Abstract References Reviews Resources

Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL

Bilal Kartal, Pablo Hernandez-Leal, Matthew E. Taylor

Published 2018-11-30Version 1

Deep reinforcement learning (DRL) has achieved great successes in recent years with the help of novel methods and higher compute power. However, there are still several challenges to be addressed such as convergence to locally optimal policies and long training times. In this paper, firstly, we augment Asynchronous Advantage Actor-Critic (A3C) method with a novel self-supervised auxiliary task, i.e. \emph{Terminal Prediction}, measuring temporal closeness to terminal states, namely A3C-TP. Secondly, we propose a new framework where planning algorithms such as Monte Carlo tree search or other sources of (simulated) demonstrators can be integrated to asynchronous distributed DRL methods. Compared to vanilla A3C, our proposed methods both learn faster and converge to better policies on a two-player mini version of the Pommerman game.

Comments: 9 pages, 6 figures, To appear at AAAI-19 Workshop on Reinforcement Learning in Games

Categories: cs.LG, cs.AI, cs.NE

Keywords: monte carlo tree search, asynchronous deep rl, demonstrator, augment asynchronous advantage actor-critic, novel self-supervised auxiliary task

Related articles: Most relevant | Search more

arXiv:2305.16209 [cs.LG] (Published 2023-05-25)

C-MCTS: Safe Planning with Monte Carlo Tree Search

Dinesh Parthasarathy, Georgios Kontes, Axel Plinge, Christopher Mutschler

arXiv:1910.06862 [cs.LG] (Published 2019-10-15)

Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value Functions

Lars Buesing, Nicolas Heess, Theophane Weber

arXiv:2412.07186 [cs.LG] (Published 2024-12-10)

Monte Carlo Tree Search based Space Transfer for Black-box Optimization