arXiv:2207.06559 Abstract | arXiv Analytics

arXiv:2207.06559 [cs.LG]Abstract References Reviews Resources

Fully Decentralized Model-based Policy Optimization for Networked Systems

Yali Du, Chengdong Ma, Yuchen Liu, Runji Lin, Hao Dong, Jun Wang, Yaodong Yang

Published 2022-07-13Version 1

Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks. Such a challenge is more outstanding in multi-agent tasks, as each step of operation is more costly requiring communications or shifting or resources. This work aims to improve data efficiency of multi-agent control by model-based learning. We consider networked systems where agents are cooperative and communicate only locally with their neighbors, and propose the decentralized model-based policy optimization framework (DMPO). In our method, each agent learns a dynamic model to predict future states and broadcast their predictions by communication, and then the policies are trained under the model rollouts. To alleviate the bias of model-generated data, we restrain the model usage for generating myopic rollouts, thus reducing the compounding error of model generation. To pertain the independence of policy update, we introduce extended value function and theoretically prove that the resulting policy gradient is a close approximation to true policy gradients. We evaluate our algorithm on several benchmarks for intelligent transportation systems, which are connected autonomous vehicle control tasks (Flow and CACC) and adaptive traffic signal control (ATSC). Empirically results show that our method achieves superior data efficiency and matches the performance of model-free methods using true models.

Comments: 8 pages, 7 figures, accepted by The 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022)

Categories: cs.LG, cs.AI, cs.MA, math.OC, stat.ML

Keywords: fully decentralized model-based policy optimization, networked systems, model-based policy optimization framework, autonomous vehicle control tasks, method achieves superior data efficiency

Tags: conference paper

Related articles: Most relevant | Search more

arXiv:2010.14907 [cs.LG] (Published 2020-10-28)

Online feature selection for rapid, low-overhead learning in networked systems

Xiaoxuan Wang, Forough Shahab Samani, Rolf Stadler

arXiv:1810.02837 [cs.LG] (Published 2018-10-05)

Scaling Submodular Optimization Approaches for Control Applications in Networked Systems

Arun V Sathanur

arXiv:2410.23393 [cs.LG] (Published 2024-10-30)

Resource Governance in Networked Systems via Integrated Variational Autoencoders and Reinforcement Learning

Qiliang Chen, Babak Heydari

arXiv Analytics

arXiv:2207.06559 [cs.LG]Abstract References Reviews Resources

Fully Decentralized Model-based Policy Optimization for Networked Systems

Links

Toolbox

arXiv:2207.06559 [cs.LG]AbstractReferencesReviewsResources

Fully Decentralized Model-based Policy Optimization for Networked Systems

Links

Toolbox

arXiv:2207.06559 [cs.LG]Abstract References Reviews Resources